Block Join Score Highlighting

2014-05-01 Thread StrW_dev
Hello,

I am trying out block joins for my index at the moment as I have many
documents that are mainly variations of the same search content. In my case
denormalization is not an option, so I am using nested documents.

The structure looks like this:
doc
content
doc
filter
boost
required info
/doc
doc

I search within the parent document and filter on the child documents. I get
the correct documents this way, but I have issues with scoring and
highlighting.
I am currently searching on the parent document and returning the child
document, as they hold specific information I require. I use the boost field
of the child to boost the score of the documents individually.

1. I want the highlighting snippet from the parent document, but the
snippets returned are empty as they are based on the childs.
2. I also want to use the score from the parent document search together
with the child boost, but now I only get the score from filtering the child
nodes (which is 0).

I also tried it the other way around; returning the parent node and only
filtering on the child node, but in that case I can't boost on the specific
child or return the information within that child that I need.

Are there options to work around these issues? Or are they just not
supported at the moment?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Block-Join-Score-Highlighting-tp4134045.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to Add a New core

2014-05-01 Thread Sohan Kalsariya
Hello All.
How do i add a new core in Solr ?
My solr directory is :
/usr/share/solr-4.6.1/example/solr
And it is having only one collection i.e. collection1
Now to add the new core I added a directory collection2 and in that I
created 2 more directory.
/conf  /lib
Now my what should be the entry in solr.xml file?

solr

  solrcloud
str name=host${host:}/str
int name=hostPort${jetty.port:8983}/int
str name=hostContext${hostContext:solr}/str
int name=zkClientTimeout${zkClientTimeout:15000}/int
bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool
  /solrcloud

  shardHandlerFactory name=shardHandlerFactory
class=HttpShardHandlerFactory
int name=socketTimeout${socketTimeout:0}/int
int name=connTimeout${connTimeout:0}/int
  /shardHandlerFactory

/solr


What to add in this file to register the new core?
Please guide me.

-- 
Regards,
*Sohan Kalsariya*


Getting min and max of a solr field for each group while doing field collapsing/result grouping

2014-05-01 Thread Varun Gupta
Hi,

I am using SolrCloud for getting results grouped by a particular field.
Now, I also want to get min and max value for a particular field for each
group. For example, if I am grouping results by city, then I also want to
get the minimum and maximum price for each city.

Is this possible to do with Solr.

Thanks in Advance!

--
Varun Gupta


Re: Getting min and max of a solr field for each group while doing field collapsing/result grouping

2014-05-01 Thread Ahmet Arslan
Hi Varun,

I think you can use group.truncate=true with stats component 
http://wiki.apache.org/solr/StatsComponent
    

If true, facet counts are based on the most relevant document of each group 
matching the query. Same applies for StatsComponent. Default is false. ! 
Solr3.4 Supported from Solr 3.4 and up.

On Thursday, May 1, 2014 12:30 PM, Varun Gupta varun.vgu...@gmail.com wrote:

Hi,

I am using SolrCloud for getting results grouped by a particular field.
Now, I also want to get min and max value for a particular field for each
group. For example, if I am grouping results by city, then I also want to
get the minimum and maximum price for each city.

Is this possible to do with Solr.

Thanks in Advance!

--
Varun Gupta


Re: Getting min and max of a solr field for each group while doing field collapsing/result grouping

2014-05-01 Thread Varun Gupta
Hi Ahmet,

Thanks for the information! But as per Solr documentation, group.truncate
is not supported in distributed searches and I am looking for a solution
that can work on SolrCloud.

--
Varun Gupta

On Thu, May 1, 2014 at 4:12 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Varun,

 I think you can use group.truncate=true with stats component
 http://wiki.apache.org/solr/StatsComponent


 If true, facet counts are based on the most relevant document of each
 group matching the query. Same applies for StatsComponent. Default is
 false. ! Solr3.4 Supported from Solr 3.4 and up.

 On Thursday, May 1, 2014 12:30 PM, Varun Gupta varun.vgu...@gmail.com
 wrote:

 Hi,

 I am using SolrCloud for getting results grouped by a particular field.
 Now, I also want to get min and max value for a particular field for each
 group. For example, if I am grouping results by city, then I also want to
 get the minimum and maximum price for each city.

 Is this possible to do with Solr.

 Thanks in Advance!

 --
 Varun Gupta



Re: How to Add a New core

2014-05-01 Thread Shawn Heisey
On 5/1/2014 1:49 AM, Sohan Kalsariya wrote:
 Hello All.
 How do i add a new core in Solr ?
 My solr directory is :
 /usr/share/solr-4.6.1/example/solr
 And it is having only one collection i.e. collection1
 Now to add the new core I added a directory collection2 and in that I
 created 2 more directory.
 /conf  /lib
 Now my what should be the entry in solr.xml file?
 
 solr

Your solr.xml is the new format.  This was usable in 4.4.0, and the
solr.xml in the example was upgraded in 4.4 to use that format.  The new
solr.xml format means that Solr is doing core discovery.

This means that you don't add anything to the solr.xml.

One way to do it: Create a core.properties file in the collection2
directory that is similar to the one you'll find in the collection1
directory, and restart Solr.  The original core and the new core will
both be discovered because Solr is looking for the core.properties file.

http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29

Another way to do it: Once the conf directory exists with schema.xml,
solrconfig.xml, and any other config files that those reference, you can
also use the CoreAdmin API, which is exposed in the Solr admin UI.  You
need the CREATE action.  This should create the core.properties file and
add the core without requiring a Solr restart.

Thanks,
Shawn



Searching for tokens does not return any results

2014-05-01 Thread Yetkin Ozkucur
Hello everyone,

I am new to SOLR and this is my first post in this list. 
I have been working on this problem for a couple of days. I tried everything 
which I found in google but it looks like I am missing something.

Here is my problem:
I have a field called: DBASE_LOCAT_NM_TEXT
It contains values like: CRD_PROD
The goal is to be able to search this field either by putting the exact string 
CRD_PROD or part of it (tokenized by _)  like CRD or PROD

Currently: 
This query returns results: q=DBASE_LOCAT_NM_TEXT:CRD_PROD
But this does not: q=DBASE_LOCAT_NM_TEXT:CRD
I want to understand why the second query does not return any results

Here is how I configured the field:
field name=DBASE_LOCAT_NM_TEXT type=text_general indexed=true 
stored=true required=false multiValued=false/

And Here is how I configured the field type :
    fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
  filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.StopFilterFactory  ignoreCase=true 
words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
    filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
    filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/

    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer
    /fieldType

I am also using the analysis panel in the SOLR admin console. It shows this:
WT  CRD_PROD

WDF CRD_PROD
CRD
PROD
CRDPROD

SF  CRD_PROD
CRD
PROD
CRDPROD

LCF crd_prod
crd
prod
crdprod

SKMFcrd_prod
crd
prod
crdprod

RDTFcrd_prod
crd
prod
crdprod


I am not sure if it is related or not but this index was created using a Java 
program using Lucene interface. It used StandardAnalyzer for writing and the 
field was configured as tokenized, indexed and stored.  Does this affect the 
SOLR configuration?

Can you please help me understand what I am missing and how I can debug it?

Thanks,
Yetkin


Re: Block Join Score Highlighting

2014-05-01 Thread Mikhail Khludnev
Hello,

Score support is addressed at
https://issues.apache.org/jira/browse/SOLR-5882.
 Highlighting is another story. be aware of
http://heliosearch.org/expand-block-join/ it might somehow useful for your
problem.


On Thu, May 1, 2014 at 11:32 AM, StrW_dev r.j.bamb...@structweb.nl wrote:

 Hello,

 I am trying out block joins for my index at the moment as I have many
 documents that are mainly variations of the same search content. In my case
 denormalization is not an option, so I am using nested documents.

 The structure looks like this:
 doc
 content
 doc
 filter
 boost
 required info
 /doc
 doc

 I search within the parent document and filter on the child documents. I
 get
 the correct documents this way, but I have issues with scoring and
 highlighting.
 I am currently searching on the parent document and returning the child
 document, as they hold specific information I require. I use the boost
 field
 of the child to boost the score of the documents individually.

 1. I want the highlighting snippet from the parent document, but the
 snippets returned are empty as they are based on the childs.
 2. I also want to use the score from the parent document search together
 with the child boost, but now I only get the score from filtering the child
 nodes (which is 0).

 I also tried it the other way around; returning the parent node and only
 filtering on the child node, but in that case I can't boost on the specific
 child or return the information within that child that I need.

 Are there options to work around these issues? Or are they just not
 supported at the moment?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Block-Join-Score-Highlighting-tp4134045.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Searching for tokens does not return any results

2014-05-01 Thread Ahmet Arslan
Hi Yetkin,

You are on the right track by examining analysis page. How is your query 
analyzed using query analyzer?

According to what you pasted q=CRD should return your example document.

Did you change something in schema.xml and forget to re-start solr and  
re-index?

By the way simple letter tokenizer based lowercase tokenizer seems a better fit 
to your use-case. With this you dont have deal with WDF's parameters.

https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-LowerCaseTokenizer

Ahmet





On Thursday, May 1, 2014 5:04 PM, Yetkin Ozkucur yetkin.ozku...@asg.com wrote:
Hello everyone,

I am new to SOLR and this is my first post in this list. 
I have been working on this problem for a couple of days. I tried everything 
which I found in google but it looks like I am missing something.

Here is my problem:
I have a field called: DBASE_LOCAT_NM_TEXT
It contains values like: CRD_PROD
The goal is to be able to search this field either by putting the exact string 
CRD_PROD or part of it (tokenized by _)  like CRD or PROD

Currently: 
This query returns results: q=DBASE_LOCAT_NM_TEXT:CRD_PROD
But this does not: q=DBASE_LOCAT_NM_TEXT:CRD
I want to understand why the second query does not return any results

Here is how I configured the field:
field name=DBASE_LOCAT_NM_TEXT type=text_general indexed=true 
stored=true required=false multiValued=false/

And Here is how I configured the field type :
    fieldType name=text_general class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
  filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.StopFilterFactory  ignoreCase=true 
words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
    filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
    filter class=solr.WordDelimiterFilterFactory preserveOriginal=1 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/

    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.KeywordMarkerFilterFactory 
protected=protwords.txt/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer
    /fieldType

I am also using the analysis panel in the SOLR admin console. It shows this:
WT    CRD_PROD

WDF    CRD_PROD
    CRD
    PROD
    CRDPROD

SF    CRD_PROD
    CRD
    PROD
    CRDPROD

LCF    crd_prod
    crd
    prod
    crdprod

SKMF    crd_prod
    crd
    prod
    crdprod

RDTF    crd_prod
    crd
    prod
    crdprod


I am not sure if it is related or not but this index was created using a Java 
program using Lucene interface. It used StandardAnalyzer for writing and the 
field was configured as tokenized, indexed and stored.  Does this affect the 
SOLR configuration?
    
Can you please help me understand what I am missing and how I can debug it?

Thanks,
Yetkin


Roll up query with original facets

2014-05-01 Thread Darin Amos
Hello All,

I am having a query issue I cannot seem to find the correct answer for. I am 
searching against a list of items and returning facets for that list of items. 
I would like to group the result set on a field such as a “parentItemId”. 
parentItemId maps to other documents within the same core. I would like my 
query to return the documents that match parentItemId, but still return the 
facets of the original query.

Is this possible with SOLR 4.3 that I am running? I can provide more details if 
needed, thanks!

Darin

Please add me as Solr Conrtibutor

2014-05-01 Thread Keith Thoma
my wiki username is KeithThoma

Please add me to the list so I will be able to make updates to the Solr
Wiki.


Keith Thoma


Re: timeAllowed in not honoring

2014-05-01 Thread Shawn Heisey
On 4/30/2014 5:53 PM, Aman Tandon wrote:
 Shawn - Yes we have some plans to move to SolrCloud, Our total index size
 is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr
 is 14GB, the GC tuning parameters using in our server
 is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps.

This means that you have about 18GB of RAM left over to cache a 40GB
index.  That's less than 50 percent.  Every index is different, but this
is in the ballpark of where performance problems begin.  If you had 48GB
of RAM, your performance (not counting possible GC problems) would
likely be very good.  64GB would be ideal.

Your only GC tuning is switching the collector to CMS.  This won't be
enough.  When I had a config like this and heap of only 8GB, I was
seeing GC pauses of 10 to 12 seconds.

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

One question: Do you really need 14GB of heap?  One of my servers has a
total of 65GB of index (54 million docs) with a 7GB heap and 64GB of
RAM.  Currently I don't use facets, though.  When I do, they will be
enum.  If you switch all your facets to enum, your heap requirements may
go down.  Decreasing the heap size will make more memory available for
index caching.

Thanks,
Shawn



Re: Please add me as Solr Conrtibutor

2014-05-01 Thread Stefan Matheis
I’ve added you Keith, go ahead :)

-Stefan  


On Thursday, May 1, 2014 at 4:42 PM, Keith Thoma wrote:

 my wiki username is KeithThoma
  
 Please add me to the list so I will be able to make updates to the Solr
 Wiki.
  
  
 Keith Thoma  



Re: overseer queue clogged

2014-05-01 Thread ryan.cooke
I saw an overseer queue clogged as well due to a bad message in the queue.
Unfortunately this went unnoticed for a while until there were 130K messages
in the overseer queue. Since it was a production system we were not able to
simply stop everything and delete all Zookeeper data, so we manually deleted
messages by issuing commands directly through the zkCli.sh tool. After all
the messages had been cleared, some nodes were in the wrong state (e.g.
'down' when should have been 'active'). Restarting the 'down' or 'recovery
failed' nodes brought the whole cluster back to a stable and healthy state.

Since it can take some digging to determine backlog in the overseer queue,
some of the symptoms we saw were:
Overseer throwing an exception like Path must not end with / character
Random nodes throwing an exception like ClusterState says we are the
leader, but locally we don't think so
Bringing up new replicas time out when attempting to fetch shard id



--
View this message in context: 
http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: overseer queue clogged

2014-05-01 Thread Mark Miller
What version are you running? This was fixed in a recent release. It can happen 
if you hit add core with the defaults on the admin page in older versions.

-- 
Mark Miller
about.me/markrmiller

On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com) wrote:

I saw an overseer queue clogged as well due to a bad message in the queue.  
Unfortunately this went unnoticed for a while until there were 130K messages  
in the overseer queue. Since it was a production system we were not able to  
simply stop everything and delete all Zookeeper data, so we manually deleted  
messages by issuing commands directly through the zkCli.sh tool. After all  
the messages had been cleared, some nodes were in the wrong state (e.g.  
'down' when should have been 'active'). Restarting the 'down' or 'recovery  
failed' nodes brought the whole cluster back to a stable and healthy state.  

Since it can take some digging to determine backlog in the overseer queue,  
some of the symptoms we saw were:  
Overseer throwing an exception like Path must not end with / character  
Random nodes throwing an exception like ClusterState says we are the  
leader, but locally we don't think so  
Bringing up new replicas time out when attempting to fetch shard id  



--  
View this message in context: 
http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
  
Sent from the Solr - User mailing list archive at Nabble.com.  


RE : Shards don't return documents in same order

2014-05-01 Thread Francois Perron
Hi Erick,

  thank you for your response.  You are right, I changed alphaOnlySort to keep 
lettres and numbers and to remove some acticles (a, an, the).

This is the filetype definition :

fieldType name=alphaOnlySort class=solr.TextField 
sortMissingLast=true omitNorms=true
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.PatternReplaceFilterFactory replace=all 
replacement= pattern=(\b(a|an|the)\b|[^a-z,0-9])/
  /analyzer
/fieldType


Then, I tested each name with admin ui on each server and this is the results :

server1

MB20140410A = mb20140410a
MB20140411A = mb20140411a
MB20140410A-New = mb20140410anew

server2

MB20140410A = mb20140410a
MB20140411A = mb20140411a
MB20140410A-New = mb20140410anew

server3

MB20140410A = mb20140410a
MB20140411A = mb20140411a
MB20140410A-New = mb20140410anew

Unfortunately, all results are identical so is there a mean to view data real 
indexed in these documents ?  Can be a problem with a particular server ?  All 
configs are in zookeeper so all cores shouldhave the same config, right ?  Is 
there any way to force a replicat to resynchronize ?

Regards,

Francois.


De : Erick Erickson [erickerick...@gmail.com]
Envoyé : 30 avril 2014 16:36
À : solr-user@lucene.apache.org
Objet : Re: Shards don't return documents in same order

Hmmm, take a look at the admin/analysis page for these inputs for
alphaOnlySort. If you're using the stock Solr distro, you're probably
not considering the effects patternReplaceFilterFactory which is
removing all non-letters. So these three terms reduce to

mba
mba
mbanew

You can look at the actual indexed terms by the admin/schema-browser as well.

That said, unless you transposed the order because you were
concentrating on the numeric part, the doc with MB20140410A-New should
always be sorting last.

All of which is irrelevant if you're doing something else with
alphaOnlySort, so please paste in the fieldType definition if you've
changed it.

What gets returned in the doc for _stored_ data is a verbatim copy,
NOT the output of the analysis chain, which can be confusing.

Oh, and Solr uses the internal lucene doc ID to break ties, and docs
on different replicas can have different internal Lucene doc IDs
relative to each other as a result of merging so that's something else
to watch out for.

Best,
Erick

On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
francois.per...@ticketmaster.com wrote:
 Hi guys,

   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 
 replicat).  In my schema, I have a alphaOnlySort field with a copyfield.

 This is a part of my managed-schema :

 field name=_root_ type=string indexed=true stored=false/
 field name=_uid type=string multiValued=false indexed=true 
 required=true stored=true/
 field name=_version_ type=long indexed=true stored=true/
 field name=event_id type=string indexed=true stored=true/
 field name=event_name type=text_general indexed=true 
 stored=true/
 field name=event_name_sort type=alphaOnlySort/

 with the copyfield

   copyField source=event_name dest=event_name_sort/


 The problem is : I query my collection with a sort on my alphasort field but 
 on one of my servers, the sort order is not the same.

 On server 1 and 2, I have this result :

 doc
 str name=event_nameMB20140410A/str
 /doc
 doc
 str name=event_nameMB20140410A-New/str
 /doc
 doc
 str name=event_nameMB20140411A/str
 /doc



 and on the third one, this :

 str name=event_nameMB20140410A/str
 /doc
 doc
 str name=event_nameMB20140411A/str
 /doc
 doc
 str name=event_nameMB20140410A-New/str
 /doc


 The doc named MB20140411A should be at the end ...

 Any idea ?

 Regards


HDS 4.8.0_01 released - solr tomcat distro

2014-05-01 Thread Yonik Seeley
For those Tomcat fans out there, we've released HDS 4.8.0_01,
based on Solr 4.8.0 of course.  HDS is pretty much just Apache Solr,
with the addition of a Tomcat based server.

Download: http://heliosearch.com/heliosearch-distribution-for-solr/

HDS details:
- includes a pre-configured (threads, logging, connection settings,
message sizes, etc) and tested Tomcat based Solr server  in the
server directory
- start scripts can be run from anywhere, and allow passing JVM args
on command line (just like jetty, so it makes it easier to use)
- start scripts work around known JVM bugs
- start scripts allow setting port from command line, and default stop
port based off of http port to make it easy to run multiple servers on
a single box)
- the server directory has been kept clean but stuffing all of
tomcat under the server/tc directory


Getting started:
$ cd server
$ bin/startup.sh

To start on a different port (e.g. 7574):
$ cd server
$ bin/startup.sh -Dhttp.port=7574

To shut down:
$ cd server
$ bin/shutdown.sh -Dhttp.port=7574

The scripts even accept -Djetty.port=7574 to make it easier to
cut-n-paste from start examples using jetty.  The example directory
is still there too, so you can still run the jetty based server if you
want.


-Yonik
http://heliosearch.org - solve Solr GC pauses with off-heap filters
and fieldcache


Over-ride q.op setting at query time

2014-05-01 Thread Bob Laferriere
I have set q.op=AND in solrconfig.xml and use edismax. I see the match as I 
would expect except when I explicitly try to add binary logic. When I type 

termA OR term B

I am still getting the results of termA AND termB.

Am I being stupid or is this just not possible?

Thanks,

-Bob



Re: Over-ride q.op setting at query time

2014-05-01 Thread Ahmet Arslan


Hi Bob,

Can you paste output of debugQuery=true?


On Thursday, May 1, 2014 8:00 PM, Bob Laferriere spongeb...@icloud.com wrote:

I have set q.op=AND in solrconfig.xml and use edismax. I see the match as I 
would expect except when I explicitly try to add binary logic. When I type 

termA OR term B

I am still getting the results of termA AND termB.

Am I being stupid or is this just not possible?

Thanks,

-Bob


Falling back on SlowFuzzyQuery

2014-05-01 Thread Brian Panulla
I'm working on upgrading our Solr 3 applications to Solr 4. The last piece
of the puzzle involves the change in how fuzzy matching works in the new
version. I'll have to rework how a key feature of our application is
implemented to get the same behavior with the new FuzzyQuery as I did in
the old version. I'd love to be able to get the rest of the system upgraded
first and deal with that separately.

I found a previous discussion pointing towards using SlowFuzzyQuery from
the sandbox package:

http://mail-archives.apache.org/mod_mbox/lucene-java-user/201308.mbox/%3C03be01ce98f7$da6c0760$8f441620$@thetaphi.de%3E

Can someone provide a tip to how one might re-introduce SlowFuzzyQuery?
After a brief search of the configuration options it doesn't appear to be
an obvious direct swap of the class. Would I need to implement a custom
Query Parser, Query Handler, or is this something that can be accomplished
through configuration?


Re: Over-ride q.op setting at query time

2014-05-01 Thread Shawn Heisey
On 5/1/2014 10:59 AM, Bob Laferriere wrote:
 I have set q.op=AND in solrconfig.xml and use edismax. I see the match as I 
 would expect except when I explicitly try to add binary logic. When I type 

 termA OR term B

 I am still getting the results of termA AND termB.

 Am I being stupid or is this just not possible?

This is probably the following issue:

https://issues.apache.org/jira/browse/SOLR-2649

It hasn't been fixed yet.  There's a very long history laid out there. 
I've commented on it too.

Thanks,
Shawn



Re: Over-ride q.op setting at query time

2014-05-01 Thread Bob Laferriere
When using query screen:

1. chocolate cake results in following:
str name=parsedquery_toString+(((Category2Name:chocol^40.0 | 
ManfProdNum:chocolate | ProductNumber:chocolate | ProductName:chocol^100.0 | 
Category3Name:chocol^80.0 | Category4Name:chocol^80.0 | Keywords:chocol^300.0 | 
ProductNameGrams:chocolate^100.0 | Category1Name:chocol) 
(Category2Name:cake^40.0 | ManfProdNum:cake | ProductNumber:cake | 
ProductName:cake^100.0 | Category3Name:cake^80.0 | Category4Name:cake^80.0 | 
Keywords:cake^300.0 | ProductNameGrams:cake^100.0 | Category1Name:cake))~2) 
(ProductName:chocol cake^100.0) (Keywords:chocol cake^300.0) 
(ProductNameGrams:chocolate cake^75.0) (Keywords:chocol cake^100.0)/str
  
2. chocolate OR cake results in following:
str name=parsedquery_toString+((Category2Name:chocol^40.0 | 
ManfProdNum:chocolate | ProductNumber:chocolate | ProductName:chocol^100.0 | 
Category3Name:chocol^80.0 | Category4Name:chocol^80.0 | Keywords:chocol^300.0 | 
ProductNameGrams:chocolate^100.0 | Category1Name:chocol) 
(Category2Name:cake^40.0 | ManfProdNum:cake | ProductNumber:cake | 
ProductName:cake^100.0 | Category3Name:cake^80.0 | Category4Name:cake^80.0 | 
Keywords:cake^300.0 | ProductNameGrams:cake^100.0 | Category1Name:cake)) 
(ProductName:chocol cake^100.0) (Keywords:chocol cake^300.0) 
(ProductNameGrams:chocolate cake^75.0) (Keywords:chocol cake^100.0)/str
  

3. if I remove q.op=AND to default to chocolate or cake:
str name=parsedquery_toString+((Category2Name:chocol^40.0 | 
ManfProdNum:chocolate | ProductNumber:chocolate | ProductName:chocol^100.0 | 
Category3Name:chocol^80.0 | Category4Name:chocol^80.0 | Keywords:chocol^300.0 | 
ProductNameGrams:chocolate^100.0 | Category1Name:chocol) 
(Category2Name:cake^40.0 | ManfProdNum:cake | ProductNumber:cake | 
ProductName:cake^100.0 | Category3Name:cake^80.0 | Category4Name:cake^80.0 | 
Keywords:cake^300.0 | ProductNameGrams:cake^100.0 | Category1Name:cake)) 
(ProductName:chocol cake^100.0) (Keywords:chocol cake^300.0) 
(ProductNameGrams:chocolate cake^75.0) (Keywords:chocol cake^100.0)/str

The parsed queries are identical, do you know where the “AND” and “OR” logic 
would show up in debugQuery? I would expect the same results from the query for 
#2 and #3 but get different results.

-Bob






On May 1, 2014, at 12:27 PM, Ahmet Arslan iori...@yahoo.com wrote:

 
 
 Hi Bob,
 
 Can you paste output of debugQuery=true?
 
 
 On Thursday, May 1, 2014 8:00 PM, Bob Laferriere spongeb...@icloud.com 
 wrote:
 
 I have set q.op=AND in solrconfig.xml and use edismax. I see the match as I 
 would expect except when I explicitly try to add binary logic. When I type 
 
 termA OR term B
 
 I am still getting the results of termA AND termB.
 
 Am I being stupid or is this just not possible?
 
 Thanks,
 
 -Bob



XSLT Caching Warning

2014-05-01 Thread Christopher Gross
I get this warning when Solr (4.7.2) Starts:
WARN  org.apache.solr.util.xslt.TransformerProvider  â The
TransformerProvider's simplistic XSLT caching mechanism is not appropriate
for high load scenarios, unless a single XSLT transform is used and
xsltCacheLifetimeSeconds is set to a sufficiently high value.

The solrconfig.xml setting is:
  queryResponseWriter name=xslt class=solr.XSLTResponseWriter
int name=xsltCacheLifetimeSeconds10/int
  /queryResponseWriter

Is there a different class that I should be using?  Is there a higher
number than 10 that will do the trick?

Thanks!

-- Chris


Question about Facets in Solr Cloud and there accuracy (especially when not ordered by count)

2014-05-01 Thread Darin McBeath
I found the following discussion very helpful from back in 2012.

http://markmail.org/thread/lkl7ffi77w7hpv6n


Probably the best description I've seen for how facets are actually calculated 
in Solr Cloud.  Thanks.  I presume this is for the most part still accurate.

But, I have a slightly different question.  Instead of ordering the results for 
a facet by count, what if I choose to order my results based on index. Will the 
counts still be correct in Solr Cloud?  I would assume the same magic (as was 
described in the above link) could have been as easily applied to this 
approach. I assume that I could apply a prefix to the facet when ordering by 
index (such as 'A' to identify those products beginning with an 'A') and that 
the counts would still be correct.

Thanks.

Darin.


Question about Facets and Solr Cloud (accuracy when ordered by index)

2014-05-01 Thread Darin McBeath
I found the following discussion very helpful from back in 2012.

http://markmail.org/thread/lkl7ffi77w7hpv6n


Probably the best description I've seen for how facets are actually calculated 
in Solr Cloud.  Thanks.  I presume this is for the most part still accurate.

But, I have a slightly different question.  Instead of ordering the results for 
a facet by count, what if I choose to order my results based on index. Will the 
counts still be correct in Solr Cloud?  I would assume the same magic (as was 
described in the above link) could have been applied to this approach. I assume 
that I could also apply a prefix to the facet when ordering by index (such as 
'A' to identify those products beginning with an 'A') and that the counts would 
still be correct.

Thanks.

Darin.

Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Costi Muraru
Hi guys,

What would you say it's the fastest way to import data in SolrCloud?
Our use case: each day do a single import of a big number of documents.

Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk
import feature in SOLR? I came upon this promising link:
http://wiki.apache.org/solr/UpdateCSV
Any idea on how UpdateCSV is performance-wise compared with
SolrJ/DataImportHandler?

If SolrJ, should we split the data in chunks and start multiple clients at
once? In this way we could perhaps take advantage of the multitude number
of servers in the SolrCloud configuration?

Either way, after the import is finished, should we do an optimize or a
commit or none (
http://wiki.solarium-project.org/index.php/V1:Optimize_command)?

Any tips and tricks to perform this process the right way are gladly
appreciated.

Thanks,
Costi


Re: Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Anshum Gupta
Hi Costi,

I'd recommend SolrJ, parallelize the inserts. Also, it helps to set the
commit intervals reasonable.

Just to get a better perspective
* Why do you want to do a full index everyday?
* How much of data are we talking about?
* What's your SolrCloud setup like?
* Do you already have some benchmarks which you're not happy with?



On Thu, May 1, 2014 at 1:47 PM, Costi Muraru costimur...@gmail.com wrote:

 Hi guys,

 What would you say it's the fastest way to import data in SolrCloud?
 Our use case: each day do a single import of a big number of documents.

 Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk
 import feature in SOLR? I came upon this promising link:
 http://wiki.apache.org/solr/UpdateCSV
 Any idea on how UpdateCSV is performance-wise compared with
 SolrJ/DataImportHandler?

 If SolrJ, should we split the data in chunks and start multiple clients at
 once? In this way we could perhaps take advantage of the multitude number
 of servers in the SolrCloud configuration?

 Either way, after the import is finished, should we do an optimize or a
 commit or none (
 http://wiki.solarium-project.org/index.php/V1:Optimize_command)?

 Any tips and tricks to perform this process the right way are gladly
 appreciated.

 Thanks,
 Costi




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: timeAllowed in not honoring

2014-05-01 Thread Aman Tandon
Hi Shawn,

Please check that link
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method there is
something mentioned in facet.method wiki

*The default value is fc (except for BoolField which uses enum) since it
tends to use less memory and is faster then the enumeration method when a
field has many unique terms in the index.*

So can you explain how enum is faster than default. Also we are currently
using the solr 4.2 does that support this facet.method=enum, if not then
which version should we pick.

We are planning to move to SolrCloud with the version solr 4.7.1, so does
this 14 GB of RAM will be sufficient? or should we increase it?


With Regards
Aman Tandon


On Thu, May 1, 2014 at 8:20 PM, Shawn Heisey s...@elyograg.org wrote:

 On 4/30/2014 5:53 PM, Aman Tandon wrote:
  Shawn - Yes we have some plans to move to SolrCloud, Our total index
 size
  is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr
  is 14GB, the GC tuning parameters using in our server
  is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime
  -XX:+PrintGCDetails -XX:+PrintGCTimeStamps.

 This means that you have about 18GB of RAM left over to cache a 40GB
 index.  That's less than 50 percent.  Every index is different, but this
 is in the ballpark of where performance problems begin.  If you had 48GB
 of RAM, your performance (not counting possible GC problems) would
 likely be very good.  64GB would be ideal.

 Your only GC tuning is switching the collector to CMS.  This won't be
 enough.  When I had a config like this and heap of only 8GB, I was
 seeing GC pauses of 10 to 12 seconds.

 http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

 One question: Do you really need 14GB of heap?  One of my servers has a
 total of 65GB of index (54 million docs) with a 7GB heap and 64GB of
 RAM.  Currently I don't use facets, though.  When I do, they will be
 enum.  If you switch all your facets to enum, your heap requirements may
 go down.  Decreasing the heap size will make more memory available for
 index caching.

 Thanks,
 Shawn




Re: XSLT Caching Warning

2014-05-01 Thread Ahmet Arslan
Hi Chris,

Looking at source code reveals that warning message printed always. Independent 
of xsltCacheLifetimeSeconds value. 


 /** singleton */
  private TransformerProvider() {
    // tell'em: currently, we only cache the last used XSLT transform, and 
blindly recompile it
    // once cacheLifetimeSeconds expires
    log.warn(
    The TransformerProvider's simplistic XSLT caching mechanism is not 
appropriate 
    + for high load scenarios, unless a single XSLT transform is used
    +  and xsltCacheLifetimeSeconds is set to a sufficiently high value.
    );
  }





On Thursday, May 1, 2014 11:29 PM, Christopher Gross cogr...@gmail.com wrote:
I get this warning when Solr (4.7.2) Starts:
WARN  org.apache.solr.util.xslt.TransformerProvider  â The
TransformerProvider's simplistic XSLT caching mechanism is not appropriate
for high load scenarios, unless a single XSLT transform is used and
xsltCacheLifetimeSeconds is set to a sufficiently high value.

The solrconfig.xml setting is:
  queryResponseWriter name=xslt class=solr.XSLTResponseWriter
    int name=xsltCacheLifetimeSeconds10/int
  /queryResponseWriter

Is there a different class that I should be using?  Is there a higher
number than 10 that will do the trick?

Thanks!

-- Chris


Re: timeAllowed in not honoring

2014-05-01 Thread Shawn Heisey
On 5/1/2014 3:03 PM, Aman Tandon wrote:
 Please check that link
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.method there is
 something mentioned in facet.method wiki

 *The default value is fc (except for BoolField which uses enum) since it
 tends to use less memory and is faster then the enumeration method when a
 field has many unique terms in the index.*

 So can you explain how enum is faster than default. Also we are currently
 using the solr 4.2 does that support this facet.method=enum, if not then
 which version should we pick.

 We are planning to move to SolrCloud with the version solr 4.7.1, so does
 this 14 GB of RAM will be sufficient? or should we increase it?

The fc method (which means fieldcache) puts all the data required to
build facets on that field into the fieldcache, and that data stays
there until the next commit or restart.  If you are committing
frequently, that memory use might be wasted.

I was surprised to read that fc uses less memory.  It may be very true
that the amount of memory required for a single call with
facet.method=enum is more than the amount of memory required in the
fieldcache for facet.method=fc, but that memory can be recovered as
garbage -- with the fc method, it can't be recovered.  It sits there,
waiting for that facet to be used again, so it can speed it up.  When
you commit and open a new searcher, it gets thrown away.

If you use a lot of different facets, the fieldcache can become HUGE
with the fc method.  *If you don't do all those facets at the same time*
(a very important qualifier), you can switch to enum and the total
amount of resident heap memory required will be a lot less.  There may
be a lot of garbage to collect, but the total heap requirement at any
given moment should be smaller.  If you actually need to build a lot of
different facets at nearly the same time, enum may not actually help.

The enum method is actually a little slower than fc for a single run,
but the java heap characteristics for multiple runs can cause enum to be
faster in bulk.  Try both and see what your results are.

Thanks,
Shawn



Re: Fastest way to import big amount of documents in SolrCloud

2014-05-01 Thread Costi Muraru
Thanks for the reply, Anshum. Please see my answers to your questions below.

* Why do you want to do a full index everyday?
Not sure I understand what you mean by full index. Every day we want to
import additional documents to the existing ones. Of course, we want to
remove older ones as well, so the total amount remains roughly the same.
* How much of data are we talking about?
The number of new documents is around 500k each day.
* What's your SolrCloud setup like?
We're currently using Solr 3.6 with 16 shards and planning to switch to
SolrCloud, hence the inquiry.
* Do you already have some benchmarks which you're not happy with?
Not yet. Planning to do some tests quite soon. I was looking for some
guidance before jumping in.

Also, it helps to set the commit intervals reasonable.
What do you mean by *reasonable*? Also, do you recommend using autoCommit?
We are currently doing an optimize after each import (in Solr 3), in order
to speed up future queries. This is proving to take very long though
(several hours). Doing a commit instead of optimize is usually bringing the
master and slave nodes down. We reverted to calling optimize on every
ingest.



On Thu, May 1, 2014 at 11:57 PM, Anshum Gupta ans...@anshumgupta.netwrote:

 Hi Costi,

 I'd recommend SolrJ, parallelize the inserts. Also, it helps to set the
 commit intervals reasonable.

 Just to get a better perspective
 * Why do you want to do a full index everyday?
 * How much of data are we talking about?
 * What's your SolrCloud setup like?
 * Do you already have some benchmarks which you're not happy with?



 On Thu, May 1, 2014 at 1:47 PM, Costi Muraru costimur...@gmail.com
 wrote:

  Hi guys,
 
  What would you say it's the fastest way to import data in SolrCloud?
  Our use case: each day do a single import of a big number of documents.
 
  Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk
  import feature in SOLR? I came upon this promising link:
  http://wiki.apache.org/solr/UpdateCSV
  Any idea on how UpdateCSV is performance-wise compared with
  SolrJ/DataImportHandler?
 
  If SolrJ, should we split the data in chunks and start multiple clients
 at
  once? In this way we could perhaps take advantage of the multitude number
  of servers in the SolrCloud configuration?
 
  Either way, after the import is finished, should we do an optimize or a
  commit or none (
  http://wiki.solarium-project.org/index.php/V1:Optimize_command)?
 
  Any tips and tricks to perform this process the right way are gladly
  appreciated.
 
  Thanks,
  Costi
 



 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: Solr 4.7 not showing parsedQuery / parsedquery_toString information

2014-05-01 Thread Chris Hostetter

Shamik:

I'm not sure what the cause of this is, but it definitely seems like a bug 
to me.  I've opened SOLR-6039 and noted a workarround for folks who don't 
care about the new track debug info and just want the same debug info 
that was available before 4.7...

https://issues.apache.org/jira/browse/SOLR-6039



: Date: Thu, 24 Apr 2014 11:50:37 -0700
: From: Shamik Bandopadhyay sham...@gmail.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Solr 4.7 not showing parsedQuery / parsedquery_toString information
: 
: Hi,
: 
:   Not sure if this has been a feature change, but I've observed that
: parsedquery and parsedquery_toString information are not displayed if
: the search doesn't return any result. Here's what is being returned.
: 
: lst name=debug
:   lst name=track
:  str name=rid54.215.121.xx-collection1-1398xx4900921-48/str
:  lst name=EXECUTE_QUERY
: lst name=http://54.215.121.xx:8983/solr/collection1/|
: http://54.215.117.xxx:8983/solr/collection1/;
:str name=QTime6/str
:str name=ElapsedTime11/str
:str name=RequestPurposeGET_TOP_IDS,GET_FACETS/str
:str name=NumFound0/str
:str
: 
name=Response{responseHeader={status=0,QTime=6,params={facet=on,tie=0.01,f.text.hl.fragsize=250,q.alt=*:*,facet.method=enum,f.ADSKAudience.facet.mincount=1,v.layout=layout,NOW=1398xx4900920,bq=Source2:sfdcarticles^3
: Source2:downloads^3 Source2:CloudHelp^2.5 Source2:blog^1
: Source2:discussion^2 Source2:documentation^1.5 Source2:youtube^1.5
: Source2:education-curriculum^2
: 
Source2:mne-help^1.5,fl=id,score,f.ADSKDocumentType.facet.limit=-1,bf=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0,facet.field=[ADSKProductLine,
: ADSKContentGroup, ADSKReleaseYear, ADSKHelpTopic, ADSKDocumentType,
: ADSKAudience],v.template=browse,fq=Source2:(mne-help OR CloudHelp OR
: documentation OR videos OR youtube OR discussion OR blog OR
: sfdcarticles OR downloads) AND -workflowparentid:[* TO *] AND
: -ADSKAccessMode:internal AND
: 
-ADSKAccessMode:beta,fsv=true,f.ADSKReleaseYear.facet.mincount=1,spellcheck.extendedResults=false,f.ADSKProductLine.facet.mincount=1,hl.fl=text
: 
title,wt=javabin,spellcheck.collate=true,requestPurpose=GET_TOP_IDS,GET_FACETS,rows=1,defType=edismax,f.ADSKReleaseYear.facet.limit=-1,facet.sort=index,start=0,q.op=AND,f.ADSKContentGroup.facet.mincount=1,spellcheck=true,f.ADSKContentGroup.facet.limit=-1,distrib=false,debug=track,shards.tolerant=true,hl=false,version=2,v.channel=adskhelpportal,title=Project
: Sunshine - HelpPortal Bundle,shard.url=
: http://54.215.121.xx:8983/solr/collection1/|
: 
http://54.215.117.xxx:8983/solr/collection1/,df=text,debugQuery=false,v.contentType=text/html;charset=UTF-8,spellcheck.count=5,f.text.hl.alternateField=ShortDesc,f.ADSKHelpTopic.facet.mincount=1,qf=text^1.5
: title^2 IndexTerm^.9 keywords^1.2 ADSKCommandSrch^2
: 
ADSKLikes^2,f.ADSKHelpTopic.facet.limit=-1,spellcheck.onlyMorePopular=false,rid=54.215.121.xx-collection1-1398xx4900921-48,q=How
: can I obtain local offline
: 
Help,f.ADSKDocumentType.facet.mincount=1,f.ADSKAudience.facet.limit=-1,isShard=true,f.ADSKProductLine.facet.limit=-1}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},facet_counts={facet_queries={},facet_fields={ADSKProductLine={},ADSKContentGroup={},ADSKReleaseYear={},ADSKHelpTopic={},ADSKDocumentType={},ADSKAudience={}},facet_dates={},facet_ranges={}},debug={}}/str
: /lst
: lst name=http://54.215.122.xxx:8983/solr/collection1/|
: http://50.18.135.xxx:8983/solr/collection1/;
:str name=QTime7/str
:str name=ElapsedTime15/str
:str name=RequestPurposeGET_TOP_IDS,GET_FACETS/str
:str name=NumFound0/str
:str
: 
name=Response{responseHeader={status=0,QTime=7,params={facet=on,tie=0.01,f.text.hl.fragsize=250,q.alt=*:*,facet.method=enum,f.ADSKAudience.facet.mincount=1,v.layout=layout,NOW=1398xx4900920,bq=Source2:sfdcarticles^3
: Source2:downloads^3 Source2:CloudHelp^2.5 Source2:blog^1
: Source2:discussion^2 Source2:documentation^1.5 Source2:youtube^1.5
: Source2:education-curriculum^2
: 
Source2:mne-help^1.5,fl=id,score,f.ADSKDocumentType.facet.limit=-1,bf=recip(ms(NOW/DAY,PublishDate),3.16e-11,1,1)^2.0,facet.field=[ADSKProductLine,
: ADSKContentGroup, ADSKReleaseYear, ADSKHelpTopic, ADSKDocumentType,
: ADSKAudience],v.template=browse,fq=Source2:(mne-help OR CloudHelp OR
: documentation OR videos OR youtube OR discussion OR blog OR
: sfdcarticles OR downloads) AND -workflowparentid:[* TO *] AND
: -ADSKAccessMode:internal AND
: 
-ADSKAccessMode:beta,fsv=true,f.ADSKReleaseYear.facet.mincount=1,spellcheck.extendedResults=false,f.ADSKProductLine.facet.mincount=1,hl.fl=text
: 

Re: Roll up query with original facets

2014-05-01 Thread Chris Hostetter

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


: Subject: Roll up query with original facets
: From: Darin Amos dari...@gmail.com
: In-Reply-To: 1398953952.39792.yahoomail...@web124702.mail.ne1.yahoo.com
: Message-Id: 5902ae5b-7545-45d4-8662-a9700e1ec...@gmail.com
: References: d6259d1ccf526540b1cb447e5f3bc39b8e344f5...@gechem8mail.asg.com
:  1398953952.39792.yahoomail...@web124702.mail.ne1.yahoo.com


-Hoss
http://www.lucidworks.com/


Re: Roll up query with original facets

2014-05-01 Thread Darin Amos
My apologies!!
On May 1, 2014 6:56 PM, Chris Hostetter hossman_luc...@fucit.org wrote:


 https://people.apache.org/~hossman/#threadhijack
 Thread Hijacking on Mailing Lists

 When starting a new discussion on a mailing list, please do not reply to
 an existing message, instead start a fresh email.  Even if you change the
 subject line of your email, other mail headers still track which thread
 you replied to and your question is hidden in that thread and gets less
 attention.   It makes following discussions in the mailing list archives
 particularly difficult.


 : Subject: Roll up query with original facets
 : From: Darin Amos dari...@gmail.com
 : In-Reply-To: 1398953952.39792.yahoomail...@web124702.mail.ne1.yahoo.com
 
 : Message-Id: 5902ae5b-7545-45d4-8662-a9700e1ec...@gmail.com
 : References: 
 d6259d1ccf526540b1cb447e5f3bc39b8e344f5...@gechem8mail.asg.com
 :  1398953952.39792.yahoomail...@web124702.mail.ne1.yahoo.com


 -Hoss
 http://www.lucidworks.com/



Re: XSLT Caching Warning

2014-05-01 Thread Christopher Gross
The message implies that there is a better way of having XSLT
transformations.  Is that the case, or is there just this perpetual warning
for normal operations?

-- Chris


On Thu, May 1, 2014 at 5:08 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Chris,

 Looking at source code reveals that warning message printed always.
 Independent of xsltCacheLifetimeSeconds value.


  /** singleton */
   private TransformerProvider() {
 // tell'em: currently, we only cache the last used XSLT transform, and
 blindly recompile it
 // once cacheLifetimeSeconds expires
 log.warn(
 The TransformerProvider's simplistic XSLT caching mechanism is
 not appropriate 
 + for high load scenarios, unless a single XSLT transform is used
 +  and xsltCacheLifetimeSeconds is set to a sufficiently high
 value.
 );
   }





 On Thursday, May 1, 2014 11:29 PM, Christopher Gross cogr...@gmail.com
 wrote:
 I get this warning when Solr (4.7.2) Starts:
 WARN  org.apache.solr.util.xslt.TransformerProvider  â The
 TransformerProvider's simplistic XSLT caching mechanism is not appropriate
 for high load scenarios, unless a single XSLT transform is used and
 xsltCacheLifetimeSeconds is set to a sufficiently high value.

 The solrconfig.xml setting is:
   queryResponseWriter name=xslt class=solr.XSLTResponseWriter
 int name=xsltCacheLifetimeSeconds10/int
   /queryResponseWriter

 Is there a different class that I should be using?  Is there a higher
 number than 10 that will do the trick?

 Thanks!

 -- Chris



Re: XSLT Caching Warning

2014-05-01 Thread Shawn Heisey
On 5/1/2014 7:30 PM, Christopher Gross wrote:
 The message implies that there is a better way of having XSLT
 transformations.  Is that the case, or is there just this perpetual warning
 for normal operations?

When I was using XSLT, I got a warning for every core, even though I had
a cached lifetime that would prevent problems.  I don't remember what
that lifetime was any more, probably at least five minutes, but it might
have been longer.  I also met the other criteria -- there was only one
XSLT defined.

Perhaps the warning needs to be suppressed if the lifetime is above a
certain value and there is only one transform defined?  I would expect
that even 60 seconds would be a long enough lifetime to prevent major
issues in high load scenarios ... but we could bikeshed that number forever.

Thanks,
Shawn



Re: XSLT Caching Warning

2014-05-01 Thread Alexandre Rafalovitch
I think the key message here is:
simplistic XSLT caching mechanism is not appropriate for high load scenarios.

As in, maybe this is not really a production-level component. One
exception is given and it is not just lifetime, it's also a
single-transform.

Are you satisfying both of those conditions? If so, it's probably ok
to just ignore the warning.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, May 2, 2014 at 3:28 AM, Christopher Gross cogr...@gmail.com wrote:
 I get this warning when Solr (4.7.2) Starts:
 WARN  org.apache.solr.util.xslt.TransformerProvider  â The
 TransformerProvider's simplistic XSLT caching mechanism is not appropriate
 for high load scenarios, unless a single XSLT transform is used and
 xsltCacheLifetimeSeconds is set to a sufficiently high value.

 The solrconfig.xml setting is:
   queryResponseWriter name=xslt class=solr.XSLTResponseWriter
 int name=xsltCacheLifetimeSeconds10/int
   /queryResponseWriter

 Is there a different class that I should be using?  Is there a higher
 number than 10 that will do the trick?

 Thanks!

 -- Chris