Re: Solr index searcher to lucene index searcher

2013-04-24 Thread parnab kumar
Hi ,

Thanks Chris. I had been using Nutch 1.1 . The Nutch IndexSearcher
used to call the lucene IndexSearcher . As the documents are collected in
TopDocs in Lucene , before that is passed back to Nutch , i used to look
into the top K matching documents , consult some external repository
and further score the Top K documents and reorder them in the TopDocs array
. These reordered  TopDocs is passed to Nutch .  All these reordering code
was implemented by Extending the lucene IndexSearcher class .
The lucene core that comes with solr is a bit different
from the one that used to come with Nutch 1.1 . As a result implementing
the same is not straight forward .Moreover , i cannot figure out at which
point exactly the SolrIndexSearcher makes a direct Interaction with
LuceneIndexSearcher .
   With FunctionQuery i loose the flexibility of looking into
the documents before passing to the final result set.

  Now i am using solr 3.4 and i would like to implement the same between
solr and lucene.

Thanks ,
Pom

On Wed, Apr 24, 2013 at 3:05 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :   . For any query it passes through the search handler and solr finally
 :   directs it to lucene Index Searcher. As results are matched and
 collected
 :   as TopDocs in lucene i want to inspect the top K Docs , reorder them
 by
 :   some logic and pass the final TopDocs to solr which solr may send as
 a
 :   response .

 can you elaborate on what exactly your some logic involves?

 instead of writing a custom collector, using a function query may be the
 best solution.

 https://people.apache.org/~hossman/#xyproblem
 XY Problem

 Your question appears to be an XY Problem ... that is: you are dealing
 with X, you are assuming Y will help you, and you are asking about Y
 without giving more details about the X so that we can understand the
 full issue.  Perhaps the best solution doesn't involve Y at all?
 See Also: http://www.perlmonks.org/index.pl?node_id=542341


 -Hoss



Re: Update on shards

2013-04-24 Thread Arkadi Colson
We are using tomcat so we'll just wait. Hopefully it's fixed in 4.3 but 
we have a work around for now so...


What exactly is the difference between jetty and tomcat. We are using 
tomcat because we've read somewhere that it should be more robust in 
heavily loaded production environments.


Arkadi

On 04/23/2013 06:14 PM, Mark Miller wrote:

If you use jetty - which you should :) It's what we test with. Tomcat only gets 
user testing.

If you use tomcat, this won't work in 4.2 or 4.2.1, but probably will in 4.3 
(we are voting on 4.3 now).

No clue on other containers.

- Mark

On Apr 23, 2013, at 10:59 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:


I believe as of 4.2 you can talk to any host in the cloud.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Apr 23, 2013 at 10:45 AM, Arkadi Colson ark...@smartbit.be wrote:

Hi

Is it correct that when inserting or updating document into solr you have to
talk to a solr host where at least one shard of that collection is stored?
For select you can talk to any host within the collection.configName?

BR,
Arkadi







JVM Parameters to Startup Solr?

2013-04-24 Thread Furkan KAMACI
Lucidworks Solr Guide says that:

If you are using Sun's JVM, add the -server command-line option when you
start Solr. This tells the JVM that it should optimize for a long running,
server process. If the Java runtime on your system is a JRE, rather than a
full JDK distribution (including javac and other development tools), then
it is possible that it may not support the -server JVM option

Does any folks using -server parameter? Also what parameters you are using
to start up Solr? I mean parallel garbage collector vs.?


Luke misreporting index-time boosts?

2013-04-24 Thread Timothy Hill
Hello, all

I have recently been attempting to apply index-time boosts to fields using
the following syntax:

add
doc
field name=important_field boost=5bleah bleah bleah/field
field name=standard_field boost=2content here/field
field name=trivial_fieldcontent here/field
/doc
doc
field name=important_field boost=5content here/field
field name=standard_field boost=2bleah bleah bleah/field
field name=trivial_fieldcontent here/field
/doc
/add

The intention is that matches on important_field should be more important
to score than matches on trivial_field (so that a search across all fields
for the term 'content' would return the second document above the first),
while still being able to use the standard query parser.

Looking at output from Luke, however, all fields are reported as having a
boost of 1.0.

The following possibilities occur to me.

(1) The entire index-time-boosting approach is misconceived
(2) Luke is misreporting, because index-time boosting alters more
fundamental aspects of scoring (tf-idf calculations, I suppose), and the
index-time boost is thus invisible to it
(3) Some combination of (1) and (2)

Can anyone help illuminate the situation for me? Documentation for these
questions seems patchy.

Thanks,

Tim


Facets with OR clause

2013-04-24 Thread vsl
Hi,

my request contains following term:

The are 3 facets:
groups, locations, categories.



When I select some items then I see such syntax in my request.
fq=groups:group1fq=locations:location1

Is it possible to add OR clause between facets items in query?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets with OR clause

2013-04-24 Thread Kai Becker
Try fq=(groups:group1 OR locations:location1)

Am 24.04.2013 um 12:39 schrieb vsl:

 Hi,
 
 my request contains following term:
 
 The are 3 facets:
 groups, locations, categories.
 
 
 
 When I select some items then I see such syntax in my request.
 fq=groups:group1fq=locations:location1
 
 Is it possible to add OR clause between facets items in query?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Facets-with-OR-clause-tp4058553.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Listing Priority

2013-04-24 Thread Jan Høydahl
Hi,

Check out the new RegexpBoostProcessor 
https://lucene.apache.org/solr/4_2_0/solr-core/org/apache/solr/update/processor/RegexpBoostProcessor.html
 which does exactly this based on a config file

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

24. apr. 2013 kl. 00:22 skrev Furkan KAMACI furkankam...@gmail.com:

 Let's assume that I have written an update processor and extracted the
 domain and checked it with my predefined list. What should I do at indexing
 process and select?
 
 
 2013/4/15 Alexandre Rafalovitch arafa...@gmail.com
 
 You may find the work and code contributions by Jan Høydahl quite
 relevant. See the presentation from 2 years ago:
 
 http://www.slideshare.net/lucenerevolution/jan-hoydahl-improving-solrs-update-chain-eurocon2011
 
 One of the things he/they contributed is URLClassify Update Processor,
 it might be quite relevant.
 
 https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html
 
 Regards,
   Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Sun, Apr 14, 2013 at 4:59 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 I have crawled some internet pages and indexed them at Solr.
 
 When I list my results via Solr I want that: if a page has a URL(my
 schema
 includes a field for URL) that ends with .edu, .edu.az or .co.uk I will
 give more priority to them.
 
 How can I do it in a more efficient way at Solr?
 



Re: How to let Solr load libs from within my JAR?

2013-04-24 Thread Jan Høydahl
Hi,

Java class loader does not support JAR within JAR. You'll have to unpack both 
JARs and then JAR them together as one. Or simply give several JARs to Solr, 
that's the easiest.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

24. apr. 2013 kl. 03:37 skrev Xi Shen davidshe...@gmail.com:

 Hi,
 
 I developed a data import handler, it has some dependent libraries. I
 deployed them in a parallel folder with my JAR and included the path in
 solrconfig.xml. It works fine. But I am thinking maybe I can pack those JAR
 libs within my JAR, but I got NoClassDefFoundError exception when executing
 my DIH.
 
 Is it possible Solr can load JAR libs packed in my JAR? How can I do that.
 
 
 -- 
 Regards,
 David Shen
 
 http://about.me/davidshen
 https://twitter.com/#!/davidshen84



Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires

2013-04-24 Thread meghana
I have configured WordDelimiterFilterFactory for custom tokenizers for ''
and '-' , and for few tokenizer (like . _ :) we need to split on boundries
only. 

e.g. 
test.com (should tokenized to test.com)
newyear.  (should tokenized to newyear)
new_car (should tokenized to new_car)
..
..

Below is defination for text field

fieldType name=text_general_preserved class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false /
 filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange =0
splitOnNumerics =0
stemEnglishPossessive =0
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=0
protected=protwords_general.txt
types=wdfftypes_general.txt
/

filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false /
filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange =0
splitOnNumerics =0
stemEnglishPossessive =0
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
preserveOriginal=0
protected=protwords_general.txt
types=wdfftypes_general.txt
/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

below is wdfftypes_general.txt content

 = ALPHA
- = ALPHA
_ = SUBWORD_DELIM
: = SUBWORD_DELIM
. = SUBWORD_DELIM

types can be used in worddelimiter  are LOWER, UPPER, ALPHA, DIGIT,
ALPHANUM, SUBWORD_DELIM . there's no description available for use of each
type. as per name, i thought type SUBWORD_DELIM may fulfill my need, but it
doesn't seem to work. 

Can anybody suggest me how can i set configuration for worddelimiter factory
to fulfill my requirement. 

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4.3

2013-04-24 Thread Jan Høydahl
As you can see on the issue, it is already fixed for 4.3

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

24. apr. 2013 kl. 07:02 skrev William Bell billnb...@gmail.com:

 Can we get this in please to 4.3?
 
 https://issues.apache.org/jira/browse/SOLR-4746
 
 
 -- 
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Re: Bug? JSON output changes when switching to solr cloud

2013-04-24 Thread Erick Erickson
Note 4.3 is being cut right now, it will probably be out next week
barring unforeseen problems.

Best
Erick

On Mon, Apr 22, 2013 at 9:11 PM, David Parks davidpark...@yahoo.com wrote:
 Thanks Yonik! That was fast!
 We switched over to XML for the moment and will switch back to JSON when 4.3
 comes out.
 Dave


 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Monday, April 22, 2013 8:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Bug? JSON output changes when switching to solr cloud

 Thanks David,

 I've confirmed this is still a problem in trunk and opened
 https://issues.apache.org/jira/browse/SOLR-4746

 -Yonik
 http://lucidworks.com


 On Sun, Apr 21, 2013 at 11:16 PM, David Parks davidpark...@yahoo.com
 wrote:
 We just took an installation of 4.1 which was working fine and changed
 it to run as solr cloud. We encountered the most incredibly bizarre
 apparent bug:

 In the JSON output, a colon ':' changed to a comma ',', which of
 course broke the JSON parser.  I'm guessing I should file this as a
 bug, but it was so odd I thought I'd post here before doing so. Demo
 below:

 Here is a query on our previous single-server instance:

 Query:
 --
 http://10.1.3.28:8081/solr/select?q=bookfl=score%2Cid%2Cunique_catalo
 g_name
 start=0rows=50wt=jsongroup=truegroup.field=unique_catalog_namegr
 oup.li
 mit=50

 Response:
 -
 {responseHeader:{status:0,QTime:15714,params:{fl:score,id,u
 nique_
 catalog_name,start:0,q:book,group.limit:50,group.field:
 uniqu
 e_catalog_name,group:true,wt:json,rows:50}},grouped:{u
 nique_
 catalog_name:{matches:106711214,groups:[{groupValue:ls:2653,
 doclis
 t:{numFound:103981882,start:0,maxScore:4.7039795,docs:[{id:
 10055

 02088784,score:4.7039795},{id:1005500291075,score:4.7039795},{id:
 1000810546074,score:4.7039795},{id:1000611003270,score:4.703
 9795},

 Note this part:
 --
   {unique_catalog_name:{matches:



 Now we run that same query on a server that was derived from the same
 build, just configuration changes to run it in distributed solr cloud
 mode.

 Query:
 -
 http://10.1.3.18:8081/solr/select?q=bookfl=score%2Cid%2Cunique_catalo
 g_name
 start=0rows=50wt=jsongroup=truegroup.field=unique_catalog_namegr
 oup.li
 mit=50

 Response:
 -{responseHeader:{status:0,QTime:8855,params:{fl
 :scor
 e,id,unique_catalog_name,start:0,q:book,group.limit:50,g
 roup.f
 ield:unique_catalog_name,group:true,wt:json,rows:50}},
 groupe
 d:[unique_catalog_name,{matches:106711214,groups:[{groupValue
 :ls:2
 653,doclist:{numFound:103981882,start:0,maxScore:4.7042913,d
 ocs:[
 {id:1005502088784,score:4.7042913},{id:1000611003270,score
 :4.704

 2913},{id:1005500291075,score:4.703668},{id:1000810546074,score:
 4.703668},

 Note how it's changed:
 
   unique_catalog_name,{matches:







Re: Fields issue 4.2.1

2013-04-24 Thread Jan Høydahl
Hi,

Have you tried fl=*_user ?

I think fl may try to interpret the number as a function.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

24. apr. 2013 kl. 07:16 skrev William Bell billnb...@gmail.com:

 I am getting no results when using dynamic field, and the name begins with
 numbers.
 
 This is okay on 3.6, but does not work in 4.2.
 
 dynamic name: 1234566_user
 
 fl=1234566_user
 
 If I change it to name: user_1234566 it works.
 
 This appears to be a bug.
 
 
 -- 
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Solr consultant recommendation

2013-04-24 Thread Christian von Wendt-Jensen
Hi

We have some detailed Solr setup issues we would like to discuss with a Solr 
Expert (certified or self-declared), but we are having some difficulties 
getting in contact with anyone here in Copenhagen, Denmark.

Therefore I would like to hear if anybody out there can drop me some names of 
Solr Experts to contact, available in Denmark?

We have issues regarding hardware setup (storage, RAM, cores pr instance, 
instances per machine), Solr Cloud vs Classic Master/Slave, shard size, to 
store or not to store, automated deployment of (more) shards, cache 
optimization, garbage collection issues, field collapsing, PERFORMANCE. You 
name it and we probably have it as an issue to discuss.

We are currently running a setup of ~450 mio documents, receiving +1mio/day. 
Interesting challenge, if you ask me…

If YOU are the one, then please get in contact.



Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone +45 36 99 00 00
Mobile +45 31 17 10 07
Email  
christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com
Webwww.infopaq.comhttp://www.infopaq.com/








DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential 
information. The information is intended only for the recipient(s) named. Any 
unauthorised disclosure, copying, distribution, exploitation or the taking of 
any action in reliance of the content of this e-mail is strictly prohibited. If 
you have received this e-mail in error we would be obliged if you would delete 
the e-mail and attachments and notify the dispatcher by return e-mail or at +45 
36 99 00 00
P Please consider the environment before printing this mail note.



Re: Solr consultant recommendation

2013-04-24 Thread Gora Mohanty
On 24 April 2013 16:28, Christian von Wendt-Jensen
christian.vonwendt-jen...@infopaq.com wrote:

 Hi

 We have some detailed Solr setup issues we would like to discuss with a
 Solr Expert (certified or self-declared), but we are having some
 difficulties getting in contact with anyone here in Copenhagen, Denmark.

 Therefore I would like to hear if anybody out there can drop me some names
 of Solr Experts to contact, available in Denmark?
[...]

Have you looked at http://wiki.apache.org/solr/Support ?

Regards,
Gora


Re: Solr consultant recommendation

2013-04-24 Thread Christian von Wendt-Jensen
Actually no, I didn't. But I can see that I should have. Thanks!




Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone +45 36 99 00 00
Mobile +45 31 17 10 07
Email  
christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com
Webwww.infopaq.comhttp://www.infopaq.com/








DISCLAIMER:
This e-mail and accompanying documents contain privileged confidential 
information. The information is intended only for the recipient(s) named. Any 
unauthorised disclosure, copying, distribution, exploitation or the taking of 
any action in reliance of the content of this e-mail is strictly prohibited. If 
you have received this e-mail in error we would be obliged if you would delete 
the e-mail and attachments and notify the dispatcher by return e-mail or at +45 
36 99 00 00
P Please consider the environment before printing this mail note.

From: Gora Mohanty g...@mimirtech.commailto:g...@mimirtech.com
Reply-To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Date: Wed, 24 Apr 2013 13:02:03 +0200
To: solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org 
solr-user@lucene.apache.orgmailto:solr-user@lucene.apache.org
Subject: Re: Solr consultant recommendation

On 24 April 2013 16:28, Christian von Wendt-Jensen
christian.vonwendt-jen...@infopaq.commailto:christian.vonwendt-jen...@infopaq.com
 wrote:

Hi

We have some detailed Solr setup issues we would like to discuss with a
Solr Expert (certified or self-declared), but we are having some
difficulties getting in contact with anyone here in Copenhagen, Denmark.

Therefore I would like to hear if anybody out there can drop me some names
of Solr Experts to contact, available in Denmark?
[...]

Have you looked at http://wiki.apache.org/solr/Support ?

Regards,
Gora



Re: Too many unique terms

2013-04-24 Thread Erick Erickson
Even if you could know ahead of time, 7M stop words is a
lot to maintain. But assuming that your index is really
pretty static, you could consider building it once, then
creating the stopword file from unique terms and re-indexing.

You could consider cleaning them on the input side or
creating a custom filter that, say, checked against a dictionary
(that you'd have to find).

There's nothing that I know of that'll allow you to delete
unique terms from a static index.

About a regex, you could use PatternReplaceCharFilterFactory
to remove them from your input stream, but the trick is defining
useless. Part numbers are really useful in some situations
for instance. There's nothing standard because there's no
standard. You haven't, for instance, provided any criteria for
what useless is. Do you care about e-mails? What about
accents? Unicode? The list gets pretty endless.

You should be able to write a regex that removes
everything non-alpha-numeric or some such for instance,
although even that is a problem if you're indexing anything but
plain-vanilla English. The Java pre-defined '\w', for instance,
refers to [a-zA-Z_0-9]. Nary an accented character in sight.


Best
Erick

On Tue, Apr 23, 2013 at 3:53 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
 Hi there,
 Looking at one of my shards (about 1M docs) i see lot of unique terms, more
 than 8M which is a significant part of my total term count. These are very
 likely useless terms, binaries or other meaningless numbers that come with
 few of my docs.
 I am totally fine with deleting them so these terms would be unsearchable.
 Thinking about it i get that
 1. It is impossible apriori knowing if it is unique term or not, so i
 cannot add them to my stop words.
 2. I have a performance decrease cause my cached chuncks do contain useless
 data, and im short on memory.

 Assuming a constant index, is there a way of deleting all terms that are
 unique from at least the dictionary tim and tip files? Will i get
 significant query time performance increase? Does any body know a class of
 regex that identify meaningless terms that i can add to my updateProcessor?

 Thanks
 Manu


Fwd: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-24 Thread Dmitry Kan
Hello list,

We deal with an anomaly when doing a distributed facet query against 102
shards.

The problem manifests itself in both the frontend solr (router) and a
shard. Each time the request is executed, always different shard is
affected (at random, hence the anomaly).

The query is: http://router_host:router_port
/solr/select?q=testfacet=truefacet.field=field_of_type_longfacet.limit=1330facet.mincount=1rows=1facet.sort=indexfacet.zeros=falsefacet.offset=0
I have omitted the shards parameter.

The router log:

request: http://10.155.244.181:9150/solr/select
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Notice the port of a shard, that is affected. That port changes all the
time, even for the same request
The log entry is prepended with lines:

SEVERE: org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

(they are not in the pastebin link)

The shard log:

Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:50)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Thread.java:722)

Apr 24, 2013 11:08:49 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select params={} status=500 QTime=2
Apr 24, 2013 11:08:49 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:50)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at 

Re: Solr consultant recommendation

2013-04-24 Thread Charlie Hull

On 24/04/2013 11:58, Christian von Wendt-Jensen wrote:

Hi

We have some detailed Solr setup issues we would like to discuss with
a Solr Expert (certified or self-declared), but we are having some
difficulties getting in contact with anyone here in Copenhagen,
Denmark.

Therefore I would like to hear if anybody out there can drop me some
names of Solr Experts to contact, available in Denmark?

We have issues regarding hardware setup (storage, RAM, cores pr
instance, instances per machine), Solr Cloud vs Classic Master/Slave,
shard size, to store or not to store, automated deployment of (more)
shards, cache optimization, garbage collection issues, field
collapsing, PERFORMANCE. You name it and we probably have it as an
issue to discuss.

We are currently running a setup of ~450 mio documents, receiving
+1mio/day. Interesting challenge, if you ask me…

If YOU are the one, then please get in contact.


Hi Christian,

We are based in the UK but have worked for a client in Copenhagen with a 
large Solr index - in fact I was there last week visiting another 
potential client. You can find out more about us from www.flax.co.uk - 
generally we work remotely but the flight from our local airport is only 
1hr20m. Do get in touch if I can tell you more.


Cheers

Charlie





Med venlig hilsen / Best Regards

Christian von Wendt-Jensen IT Team Lead, Customer Solutions

Infopaq International A/S Kgs. Nytorv 22 DK-1050 København K

Phone +45 36 99 00 00 Mobile +45 31 17 10 07
Email
christian.sonne.jen...@infopaq.commailto:christian.sonne.jen...@infopaq.com



Webwww.infopaq.comhttp://www.infopaq.com/









DISCLAIMER: This e-mail and accompanying documents contain privileged
confidential information. The information is intended only for the
recipient(s) named. Any unauthorised disclosure, copying,
distribution, exploitation or the taking of any action in reliance of
the content of this e-mail is strictly prohibited. If you have
received this e-mail in error we would be obliged if you would delete
the e-mail and attachments and notify the dispatcher by return e-mail
or at +45 36 99 00 00 P Please consider the environment before
printing this mail note.




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Querying only for + character causes org.apache.lucene.queryParser.ParseException

2013-04-24 Thread Jorge Luis Betancourt Gonzalez
One more thing:

The hack that you commented when the query is a combination of restricted query 
operators such +-, +, --++--+%, etc? In this cases the application has to 
deal with all this cases to.

Greetings!

- Mensaje original -
De: Jérôme Étévé jerome.et...@gmail.com
Para: solr-user@lucene.apache.org
Enviados: Martes, 23 de Abril 2013 10:44:39
Asunto: Re: Querying only for + character causes 
org.apache.lucene.queryParser.ParseException

If you want to allow your users to search for '+' , you also define your
'+' as being a regular ALPHA characters:

In config:

delimiter_types.txt:

#
# We let +, # and * be part of normal words.
# This is to let c++, c#, c* and RD as words.
#
+ = ALPHA
 # = ALPHA
* = ALPHA
 = ALPHA
@ = ALPHA

Then in your solr.WordDelimiterFilterFactory,
use types=delimiter_types.txt


You'll then be able to let your users search for + as part of a word.

If you want to allow them to search for just '+' , a little hacking is
necessary in your client code. Personally, I just  double quote the query
if it's only one char length. Can't be harmful and as it will turn your
single + into + , it will be considered as a token (rather than being
part of the query syntax) by the parser.

Providing you're using the edismax parser, it should be just fine for any
other queries, like '+ foo' , 'foo +', '++' ...


J.


On 23 April 2013 15:09, Jorge Luis Betancourt Gonzalez
jlbetanco...@uci.cuwrote:

 Hi Kai:

 Thanks for your reply, for what I've understood this logic must be
 included in my application, It would be possible to, for instance, use some
 regular expression at querying time in my schema to avoid a query that
 contains only this characters? for instance + and + would be a good
 catch to avoid.

 Thanks in advance!

 - Mensaje original -
 De: Kai Becker m...@kai-becker.com
 Para: solr-user@lucene.apache.org
 Enviados: Martes, 23 de Abril 2013 9:48:26
 Asunto: Re: Querying only for + character causes
 org.apache.lucene.queryParser.ParseException

 Hi,

 you need to escape that char in search terms.
 Special chars are + - ! ( ) { } [ ] ^  ~ * ? : \ / at the moment.

 The %2B is just the url encoding, but it will still be a + for Solr, so
 just put a \ in front of the chars I mentioned.

 Cheers,
 Kai

 Am 23.04.2013 um 15:41 schrieb Jorge Luis Betancourt Gonzalez:

  Hi!
 
  Currently I'm working on a basica search engine for, the main problem is
 that during some tests a problem was detected, in the application if a user
 search for the + or - term only or the + string it causes an
 exception in my application, the problem is caused for an
 org.apache.lucene.queryParser.ParseException in solr. I get the same
 response if, from the solr admin interface, I search for the + term. For
 what I've seen the + character gets encoded into %2B which cause the
 exception. Is there any way of escaping this character so they behave like
 any other character? or at least get no response for this cases?
 
  I'm using solr 3.6.2, deployed in tomcat7.
 
  Greetings!
  http://www.uci.cu

 http://www.uci.cu
 http://www.uci.cu




--
Jerome Eteve
+44(0)7738864546
http://www.eteve.net/

http://www.uci.cu
http://www.uci.cu


Solr as a jar file with Embedded Jetty

2013-04-24 Thread Furkan KAMACI
Hi;
I am new to Solr and I was using Solr as war file and deploying it into
Tomcat. However I decided to use Solr as jar file with Embedded Jetty. I
was doing like that: when I run dist at ant I get .war file of Solr and
used to deploy to Tomcat.
I want to use it as a jar file as like start.jar under example folder. What
should I do, what is that solr-core-4.2.1-SNAPSHOT?
When you change code and want to use Solr in a production environment what
do you do. Should I use that start.jar, how to compile it.


solr.StopFilterFactory doesn't work with wildcard

2013-04-24 Thread Dmitry Baranov
Good day!

I have a problem with the solr.StopFilterFactory and wildcard text search.
For query like this 'hp* pavilion* series* d4*', where 'series' is stop
word, I recieve error:
'analyzer returned no terms for multiTerm term: series'
But for query like this 'hp* pavilion* series d4*', I recieve expected
results.

Could you help me?

I have field type for search as below:

fieldType name=search_string class=solr.TextField
positionIncrementGap=100
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
/analyzer
analyzer type=multiterm
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
/analyzer
/fieldType

Solr version:

solr-spec   4.0.0.2012.10.06.03.04.33
solr-impl   4.0.0 1394950 - rmuir - 2012-10-06 03:04:33
lucene-spec 4.0.0
lucene-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:00:40



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr as a jar file with Embedded Jetty

2013-04-24 Thread Erik Hatcher
I'm not following exactly what you want, but the recommendation you'll get from 
the majority of folks is to simply use Solr's example/ directory as a starting 
point.  start.jar is Jetty and it's how most of us deploy Solr, and I'll 
recommend going that route.   Solr in Jetty is a .war file.  If you want to 
diverge from that path, you're in unrecommended territory.

Erik

On Apr 24, 2013, at 09:41 , Furkan KAMACI wrote:

 Hi;
 I am new to Solr and I was using Solr as war file and deploying it into
 Tomcat. However I decided to use Solr as jar file with Embedded Jetty. I
 was doing like that: when I run dist at ant I get .war file of Solr and
 used to deploy to Tomcat.
 I want to use it as a jar file as like start.jar under example folder. What
 should I do, what is that solr-core-4.2.1-SNAPSHOT?
 When you change code and want to use Solr in a production environment what
 do you do. Should I use that start.jar, how to compile it.



Re: Autocommit and replication have been slowing down

2013-04-24 Thread gustavonasu
Hi Shawn,

Thanks for the lesson! I really appreciate your help.

I'll figure out a way to use that knowledge to solve my problem.

Best Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocommit-and-replication-have-been-slowing-down-tp4058361p4058584.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6.1: changing a field from stored to not stored

2013-04-24 Thread Majirus FANSI
I would create a new core as slave of the existing configuration without
replicating the core schema and configuration. This way I can get the
information from one index to the other while saving the space as fields in
the new schema are mainly not stored. After the replication I would swap
the cores for the online core to point to the right index dir and conf.
i.e. the one with less stored fields.

Maj


On 24 April 2013 01:48, Petersen, Robert
robert.peter...@mail.rakuten.comwrote:

 Hey I just want to verify one thing before I start doing this:  function
 queries only require fields to be indexed but don't require them to be
 stored right?

 -Original Message-
 From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com]
 Sent: Tuesday, April 23, 2013 4:39 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr 3.6.1: changing a field from stored to not stored

 Good info, Thanks Hoss!  I was going to add a more specific fl= parameter
 to my queries at the same time.  Currently I am doing fl=*,score so that
 will have to be changed.


 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Tuesday, April 23, 2013 4:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 3.6.1: changing a field from stored to not stored


 : index?  I noticed I am unnecessarily storing some fields in my index and
 : I'd like to stop storing them without having to 'reindex the world' and
 : let the changes just naturally percolate into my index as updates come
 : in the normal course of things.  Do you guys think I could get away with
 : this?

 Yes, you can easily get away with this type of change w/o re-indexing,
 however you won't gain any immediate index size savings until each and
 every existing doc has been reindexed and the old copies expunged from the
 index via segment merges.

 the one hicup thta can affect people when doing this is what happens if
 you use something like fl=* (and likely hl=* as well) ... many places
 in Solr will try to avoid failure if a stored field is found in the index
 which isn't defined in the schema, and treat that stored value as a string
 (legacy behavior designed to make it easier for people to point Solr at old
 lucene indexes built w/o using Solr) ... so if these stored values are not
 strings, you might get some weird data in your response for these documents.


 -Hoss







RE: ranking score by fields

2013-04-24 Thread Каскевич Александр
Highlighter doesn’t help me. It mark terms but not search text. 
F.e. I have doc with field1=apache lucene, field2=apache solr. I search 
apache solr with AND default option. I found this doc with highlighted 
field1=emapache/em lucene. This is bad result for me.

Look I want to do something like this:

Search text: apache solr
RESULT:
Found in field1
Doc1
Doc2
Doc3
...
Found in field2 
Doc101
Doc102
Doc103
...

Search result have two (or more) parts. Each part sorted by other field f.e. by 
field date desc.

It mean I need make right sort and I need some flag to insert Found in field2 
text.

I try 
q=apache solr
fl=field1, field2, score, val1:$q1, val2:$q2
defType=dismax qf=field1^1000 field2^1
q1={!dismax qf=field1 v='apache solr'}
q2={!dismax qf=field2 v='apache solr'}

Now I have flags: val10 - found in field1
But now I have problem with sort: I cant use val1, val2 in sort :(.

And now my questions:
1. have I posibility use my custom fields val1, val2 in sort? With formula. Or 
params $q1, $q2?
2. may be I have posibility set score by formula at qurey-time?
3. your variant?

Thanks.
Alex.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, April 23, 2013 1:53 AM
To: solr-user@lucene.apache.org
Subject: Re: ranking score by fields

You can sometimes use the highlighter component to do this, but it's a little 
tricky...

But note your syntax isn't doing what you expect.
(field1:apache solr) parses as field1:apache defaultfield:solr. You want 
field1:(apache solr)

debug=all is your friend for these kinds of things, especially the parsed 
query section

Best
Erick

On Mon, Apr 22, 2013 at 4:44 AM, Каскевич Александр akaskev...@prontosoft.by 
wrote:
 Hi.
 I want to make subject but don't know exactly how can I do it.
 Example.
 I have index with field1, field2, field3.
 I make a query like:
 (field1:apache solr) OR (field2:apache solr) OR (field3:apache solr) 
 And I want to know: is it found this doc by field1 or by field2 or by field3?

 I try to make like this: (field1:apache solr)^100 OR (field2:apache 
 solr)^10 OR (field3:apache solr)^1 But the problem is that I don't know 
 range, minimum and maximum value of score for each field.
 With other types of similarities (BM25 or othres) same situation.
 I cant find information about this in manual.

 Else, I try to use Relevance Functions, f.e. termfreq but it work only with 
 terms, not with phrases, like apache solr.

 May be I miss something or you have other idea to do this?
 And else, I am not a java programmer and best way for me don't  write any 
 plugins for solr.

 Thanks.
 Alex.


Re: Indexing PDF Files

2013-04-24 Thread Alexandre Rafalovitch
Have you tried using absolute path to the relevant urls? That will
cleanly split the problem into 'still not working' and 'wrong relative
path'.

Regards,
   Alex.
On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 lib dir=../../../contrib/extraction/lib regex=.*\.jar /
   lib dir=../../../dist/ regex=solr-cell-\d.*\.jar /



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: Indexing PDF Files

2013-04-24 Thread Erik Hatcher
Also, at Solr startup time it logs what it loads from those lib elements, so 
you can see whether it is loading the files you intend to or not.

Erik

On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote:

 Have you tried using absolute path to the relevant urls? That will
 cleanly split the problem into 'still not working' and 'wrong relative
 path'.
 
 Regards,
   Alex.
 On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 lib dir=../../../contrib/extraction/lib regex=.*\.jar /
  lib dir=../../../dist/ regex=solr-cell-\d.*\.jar /
 
 
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)



Re: Fields issue 4.2.1

2013-04-24 Thread Jack Krupansky
Field names don't absolutely have to follow Java naming conventions, but if 
they don't then they are not GUARANTEED to work in all contexts in Solr. The 
fl parameter is one of those contexts.


You can work around it by using a function query: field(1234566_user)

-- Jack Krupansky

-Original Message- 
From: William Bell

Sent: Wednesday, April 24, 2013 1:16 AM
To: solr-user@lucene.apache.org
Subject: Fields issue 4.2.1

I am getting no results when using dynamic field, and the name begins with
numbers.

This is okay on 3.6, but does not work in 4.2.

dynamic name: 1234566_user

fl=1234566_user

If I change it to name: user_1234566 it works.

This appears to be a bug.


--
Bill Bell
billnb...@gmail.com
cell 720-256-8076 



Re: Book text with chapter line number

2013-04-24 Thread Timothy Potter
Chapter seems too broad and line seems too narrow -- have you thought
about paragraph level? Something like:

docID, book fields (title, author, publisher, etc), chapter fields (#,
title, pages, etc), section fields (title, #, etc), sub-sectionN
fields, paragraph text, lines

Seems like line #'s would only be useful for display so just store the
lines the paragraph covers.



On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood wun...@wunderwood.org wrote:
 If you can represent your books in XML, then MarkLogic could do the job very 
 cleanly. It isn't free, but it is very good.

 wunder

 On Apr 23, 2013, at 6:47 PM, Jason Funk wrote:

 Is there a better tool than Solr to use for my situation?


 On Apr 23, 2013, at 5:04 PM, Jack Krupansky j...@basetechnology.com wrote:

 There is no simple, obvious, and direct approach, right out of the box. 
 Sure, you can highlight passages of raw text, right out of the box, but 
 that won't give you chapters, pages, and line numbers. To do all of that, 
 you would have to either:

 1. Add chapter, page, and line number as part of the payload for each word. 
 And add some custom document transformers to access the information.
 or
 2. Index each line as a separate Solr document, with fields for book, 
 chapter, page, and line number.

 -- Jack Krupansky

 -Original Message- From: Jason Funk
 Sent: Tuesday, April 23, 2013 5:02 PM
 To: solr-user@lucene.apache.org
 Subject: Book text with chapter line number

 Hello.

 I'm trying to figure out if Solr is going to work for a new project that I 
 am wanting to build. At it's heart it's a book text searching application. 
 Each book is broken into chapters and each chapter is broken into lines. I 
 want to be able to search these books and return relevant sections of the 
 book and display the results with chapter and line number. I'm not sure how 
 I would structure my data so that it's efficient and functional. I could 
 simply treat each line of text as a document which would provide some of 
 the functionality but what if the search query spanned two lines? Then it 
 seems the passage the user was searching for wouldn't be returned. I could 
 treat each book as a document and use highlighting to find the context but 
 that seems to limit weighting/results for best matches as well as 
 difficultly in finding chapter/line numbers. What is the best way to do 
 this with Solr?

 Is there a better tool to use to solve my problem?


 --
 Walter Underwood
 wun...@wunderwood.org





Re: Book text with chapter line number

2013-04-24 Thread Paul Libbrecht
It's easy to then store a map of term position to line-number and page-number 
along with each paragraph, or?

Paul


On 24 avr. 2013, at 16:24, Timothy Potter wrote:

 Chapter seems too broad and line seems too narrow -- have you thought
 about paragraph level? Something like:
 
 docID, book fields (title, author, publisher, etc), chapter fields (#,
 title, pages, etc), section fields (title, #, etc), sub-sectionN
 fields, paragraph text, lines
 
 Seems like line #'s would only be useful for display so just store the
 lines the paragraph covers.
 
 
 
 On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 If you can represent your books in XML, then MarkLogic could do the job very 
 cleanly. It isn't free, but it is very good.
 
 wunder
 
 On Apr 23, 2013, at 6:47 PM, Jason Funk wrote:
 
 Is there a better tool than Solr to use for my situation?
 
 
 On Apr 23, 2013, at 5:04 PM, Jack Krupansky j...@basetechnology.com wrote:
 
 There is no simple, obvious, and direct approach, right out of the box. 
 Sure, you can highlight passages of raw text, right out of the box, but 
 that won't give you chapters, pages, and line numbers. To do all of that, 
 you would have to either:
 
 1. Add chapter, page, and line number as part of the payload for each 
 word. And add some custom document transformers to access the information.
 or
 2. Index each line as a separate Solr document, with fields for book, 
 chapter, page, and line number.
 
 -- Jack Krupansky
 
 -Original Message- From: Jason Funk
 Sent: Tuesday, April 23, 2013 5:02 PM
 To: solr-user@lucene.apache.org
 Subject: Book text with chapter line number
 
 Hello.
 
 I'm trying to figure out if Solr is going to work for a new project that I 
 am wanting to build. At it's heart it's a book text searching application. 
 Each book is broken into chapters and each chapter is broken into lines. I 
 want to be able to search these books and return relevant sections of the 
 book and display the results with chapter and line number. I'm not sure 
 how I would structure my data so that it's efficient and functional. I 
 could simply treat each line of text as a document which would provide 
 some of the functionality but what if the search query spanned two lines? 
 Then it seems the passage the user was searching for wouldn't be returned. 
 I could treat each book as a document and use highlighting to find the 
 context but that seems to limit weighting/results for best matches as well 
 as difficultly in finding chapter/line numbers. What is the best way to do 
 this with Solr?
 
 Is there a better tool to use to solve my problem?
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 



Re: solr.StopFilterFactory doesn't work with wildcard

2013-04-24 Thread Jack Krupansky
Well, what is happening is that the query parser detects a prefix query 
(series*) and then does a term analysis on the prefix alone (series), 
which you probably have in your stop words list, which causes the analyzer 
to return... nothing, which is what the error is complaining about.


You can workaround my querying for serie* (as long as serie is not also a 
stop word.


In any case, technically, the stop filter is doing exactly what it is 
supposed to do.


In all honesty, I can't imagine a context in which a noun such as series 
would be on a stop word list. What's your thinking on why it is there??


-- Jack Krupansky

-Original Message- 
From: Dmitry Baranov

Sent: Wednesday, April 24, 2013 9:43 AM
To: solr-user@lucene.apache.org
Subject: solr.StopFilterFactory doesn't work with wildcard

Good day!

I have a problem with the solr.StopFilterFactory and wildcard text search.
For query like this 'hp* pavilion* series* d4*', where 'series' is stop
word, I recieve error:
'analyzer returned no terms for multiTerm term: series'
But for query like this 'hp* pavilion* series d4*', I recieve expected
results.

Could you help me?

I have field type for search as below:

fieldType name=search_string class=solr.TextField
positionIncrementGap=100
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
/analyzer
analyzer type=multiterm
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory words=stopwords.txt
ignoreCase=true/
/analyzer
/fieldType

Solr version:

solr-spec 4.0.0.2012.10.06.03.04.33
solr-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:04:33
lucene-spec 4.0.0
lucene-impl 4.0.0 1394950 - rmuir - 2012-10-06 03:00:40



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-StopFilterFactory-doesn-t-work-with-wildcard-tp4058581.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr faceted search UI

2013-04-24 Thread richa
Hi,
I am working on a POC, where I have to display faceted search result on web
page. can anybody please help me to suggest what all set up I need to
configure to display. I would prefer java technologies. Just to mention, I
have solr cloud running on remote server.
I would like to know:
1. Should I use MVC framework?
2. How will my local interact with remote solr server?
3. How will I send query through java code and what technology I should use
to display faceted search result?

Please help me on this.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Update on shards

2013-04-24 Thread Shawn Heisey
On 4/24/2013 12:49 AM, Arkadi Colson wrote:
 We are using tomcat so we'll just wait. Hopefully it's fixed in 4.3 but
 we have a work around for now so...
 
 What exactly is the difference between jetty and tomcat. We are using
 tomcat because we've read somewhere that it should be more robust in
 heavily loaded production environments.

Both are servlet containers - a Java executable server program that can
run other programs written using the Java Servlet API.  The servlet API
was invented by Sun, who also invented Java itself.

For comparison purposes, first think about Apache's HTTPD, which is a
web server designed to serve files.  Through its rich modular
capability, it does have the ability to run web applications, but the
core HTTPD is designed to grab a file off the hard drive and send it to
a user.

A servlet container is different.  You can think of a servlet container
as a smart web server designed from the ground up to run web applications.

http://en.wikipedia.org/wiki/Java_Servlet

Solr is a servlet.  By itself, Solr can't run.  It requires a servlet
container.

Here is what wikipedia has to say about the histories of the two projects:

http://en.wikipedia.org/wiki/Apache_Tomcat#History
http://en.wikipedia.org/wiki/Jetty_%28Web_server%29#History

If you google difference between jetty and tomcat you'll find a lot of
links.  The one written by Jetty folks is particularly detailed, but has
an obvious bias.

With emphasis on tuning, you can probably get good performance out of
either container.  Jetty is smaller with a default configuration, but as
others have pointed out, most of the resource utilization will be done
by Solr, not the container.

There is one major reason that I chose to use Jetty.  It was already
there in the Solr download.  The reasons that I have stuck with it even
after having time to research: It works well, and it is extensively
tested every time anyone runs the tests that come with the Solr build
system.

Thanks,
Shawn



Re: Update on shards

2013-04-24 Thread Arkadi Colson

Thx!

On 04/24/2013 04:46 PM, Shawn Heisey wrote:

On 4/24/2013 12:49 AM, Arkadi Colson wrote:

We are using tomcat so we'll just wait. Hopefully it's fixed in 4.3 but
we have a work around for now so...

What exactly is the difference between jetty and tomcat. We are using
tomcat because we've read somewhere that it should be more robust in
heavily loaded production environments.

Both are servlet containers - a Java executable server program that can
run other programs written using the Java Servlet API.  The servlet API
was invented by Sun, who also invented Java itself.

For comparison purposes, first think about Apache's HTTPD, which is a
web server designed to serve files.  Through its rich modular
capability, it does have the ability to run web applications, but the
core HTTPD is designed to grab a file off the hard drive and send it to
a user.

A servlet container is different.  You can think of a servlet container
as a smart web server designed from the ground up to run web applications.

http://en.wikipedia.org/wiki/Java_Servlet

Solr is a servlet.  By itself, Solr can't run.  It requires a servlet
container.

Here is what wikipedia has to say about the histories of the two projects:

http://en.wikipedia.org/wiki/Apache_Tomcat#History
http://en.wikipedia.org/wiki/Jetty_%28Web_server%29#History

If you google difference between jetty and tomcat you'll find a lot of
links.  The one written by Jetty folks is particularly detailed, but has
an obvious bias.

With emphasis on tuning, you can probably get good performance out of
either container.  Jetty is smaller with a default configuration, but as
others have pointed out, most of the resource utilization will be done
by Solr, not the container.

There is one major reason that I chose to use Jetty.  It was already
there in the Solr download.  The reasons that I have stuck with it even
after having time to research: It works well, and it is extensively
tested every time anyone runs the tests that come with the Solr build
system.

Thanks,
Shawn







Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Shawn Heisey
On 4/24/2013 2:02 AM, Furkan KAMACI wrote:
 Lucidworks Solr Guide says that:
 
 If you are using Sun's JVM, add the -server command-line option when you
 start Solr. This tells the JVM that it should optimize for a long running,
 server process. If the Java runtime on your system is a JRE, rather than a
 full JDK distribution (including javac and other development tools), then
 it is possible that it may not support the -server JVM option
 
 Does any folks using -server parameter? Also what parameters you are using
 to start up Solr? I mean parallel garbage collector vs.?

The answers to your questions are hotly debated in Java communities.
This is treading on religious ground. :)

I never actually use the -server parameter.  When java runs on my
multiprocessor 64-bit Linux machines, it already knows it should be in
server mode.  If you run on a platform that Java decides is a client
machine, you might need the -server parameter.

Most people agree that you should use the CMS collector.  You won't find
much agreement about anything else on the startup commandline.  I can
tell you what I use.  It may work for you, apart from the specific value
of the -Xmx parameter.  These parameters result in fairly low GC pause
times for me.  I can tell you that I have arrived at these parameters
through testing that wasn't very methodical, so they are probably not
the optimal settings:

-Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts

The G1 collector is supposed to work for all situations without tuning,
but it didn't work for me.  GC pause times were just as long as when I
had a badly tuned CMS setup.

Thanks,
Shawn



Noob question: why doesn't this query work?

2013-04-24 Thread Brian Hurt
So, I'm executing the following query:
id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND (NOT
id:6178ZwWj5m OR numfields:[* TO 6114] OR d_4:false OR NOT
i_4:6142E=m)

It's machine generated, which explains the redundancies.  The problem is
that the query returns no results- but there is a document that should
match- it has an id of 6178dB=@Fm, an i_0 field of 613OFS, an i_3 field
of 6111, a numfields of 611A, a d_4 of true (but this shouldn't
matter), and an i_4 of 6142F1S.

The problem seems to be with the negations.  I did try to replace the NOT's
with -'s, so, for example, NOT id:6178ZwWj5m would become
-id:6178ZwWj5m, and this didn't seem to work.

Help?  What's wrong with the query?  Thanks.

Brian


Re: Solr faceted search UI

2013-04-24 Thread Erik Hatcher
It's a pretty subjective and opinionated kinda thing here, as UIs are built 
with all sorts of technologies and even though I'm quite opinionated about how 
*I* would build something I work with a lot of folks that have their own 
preferences or organizational standards/constraints on what they can use.  
Pragmatically speaking, it's best to use what you or your team are familiar 
with.

That being said... if this is strictly for a PoC and not something you need to 
put into production as-is, you can leverage the /browse feature powered by 
Solr's VelocityResponseWriter (wt=velocity) that is in  Solr's example 
configuration.

I'm not aware of any Java-based framework out there for Solr - there's so many 
choices (Struts?  Tapestry?  JSPs?  etc) that any single one of them would be 
off-putting to others.

In Java, the SolrJ library is what you want to use for remote access to Solr.  
You'll get back a Java response object that you can navigate to pull out the 
facet information to hand to your view tier.

If you're ok with something not Java (but can be deployed in a Java container 
and can interact with Java) then give projectblacklight.org a try - it's a Ruby 
on Rails full featured front-end to Solr.  There's also solrstrap that looks 
like a fun place to do some lightweight PoC development.

Erik


On Apr 24, 2013, at 10:43 , richa wrote:

 Hi,
 I am working on a POC, where I have to display faceted search result on web
 page. can anybody please help me to suggest what all set up I need to
 configure to display. I would prefer java technologies. Just to mention, I
 have solr cloud running on remote server.
 I would like to know:
 1. Should I use MVC framework?
 2. How will my local interact with remote solr server?
 3. How will I send query through java code and what technology I should use
 to display faceted search result?
 
 Please help me on this.
 
 Thanks,
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr faceted search UI

2013-04-24 Thread Majirus FANSI
Hi richa,
You can use solrJ (http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr)
to query your solr index.
On the wiki page indicated, you will see example of faceted search using
solrJ.
2009 article by Yonik available on
searchhubhttp://searchhub.org/2009/09/02/faceted-search-with-solr/
is
a good tutorial on faceted search.
Whether you go for MVC framework or not is up to you. It is recommend tough
to develop search engine application in a Service Oriented Architecture.
Regards,

Maj


On 24 April 2013 16:43, richa striketheg...@gmail.com wrote:

 Hi,
 I am working on a POC, where I have to display faceted search result on web
 page. can anybody please help me to suggest what all set up I need to
 configure to display. I would prefer java technologies. Just to mention, I
 have solr cloud running on remote server.
 I would like to know:
 1. Should I use MVC framework?
 2. How will my local interact with remote solr server?
 3. How will I send query through java code and what technology I should use
 to display faceted search result?

 Please help me on this.

 Thanks,



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Book text with chapter line number

2013-04-24 Thread Alexandre Rafalovitch
It seems that the normal use case is line=document with some exception
for cross-line indexing.

The edge case could be solved by either indexing additional 'two-line'
documents with lower boost or to have 'context' field with line
before/after where applicable (e.g. within same para).  Then there
might also be some trick around using highlighter to figure out
whether the match came from the 'line' field or from 'context' field.

I also like payload idea, though there does not seem to be too much
information around on using that.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 10:28 AM, Paul Libbrecht p...@hoplahup.net wrote:
 It's easy to then store a map of term position to line-number and 
 page-number along with each paragraph, or?

 Paul


 On 24 avr. 2013, at 16:24, Timothy Potter wrote:

 Chapter seems too broad and line seems too narrow -- have you thought
 about paragraph level? Something like:

 docID, book fields (title, author, publisher, etc), chapter fields (#,
 title, pages, etc), section fields (title, #, etc), sub-sectionN
 fields, paragraph text, lines

 Seems like line #'s would only be useful for display so just store the
 lines the paragraph covers.



 On Tue, Apr 23, 2013 at 7:51 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 If you can represent your books in XML, then MarkLogic could do the job 
 very cleanly. It isn't free, but it is very good.

 wunder

 On Apr 23, 2013, at 6:47 PM, Jason Funk wrote:

 Is there a better tool than Solr to use for my situation?


 On Apr 23, 2013, at 5:04 PM, Jack Krupansky j...@basetechnology.com 
 wrote:

 There is no simple, obvious, and direct approach, right out of the box. 
 Sure, you can highlight passages of raw text, right out of the box, but 
 that won't give you chapters, pages, and line numbers. To do all of that, 
 you would have to either:

 1. Add chapter, page, and line number as part of the payload for each 
 word. And add some custom document transformers to access the information.
 or
 2. Index each line as a separate Solr document, with fields for book, 
 chapter, page, and line number.

 -- Jack Krupansky

 -Original Message- From: Jason Funk
 Sent: Tuesday, April 23, 2013 5:02 PM
 To: solr-user@lucene.apache.org
 Subject: Book text with chapter line number

 Hello.

 I'm trying to figure out if Solr is going to work for a new project that 
 I am wanting to build. At it's heart it's a book text searching 
 application. Each book is broken into chapters and each chapter is broken 
 into lines. I want to be able to search these books and return relevant 
 sections of the book and display the results with chapter and line 
 number. I'm not sure how I would structure my data so that it's efficient 
 and functional. I could simply treat each line of text as a document 
 which would provide some of the functionality but what if the search 
 query spanned two lines? Then it seems the passage the user was searching 
 for wouldn't be returned. I could treat each book as a document and use 
 highlighting to find the context but that seems to limit 
 weighting/results for best matches as well as difficultly in finding 
 chapter/line numbers. What is the best way to do this with Solr?

 Is there a better tool to use to solve my problem?


 --
 Walter Underwood
 wun...@wunderwood.org






Re: Re: Support of field variants in solr

2013-04-24 Thread Alexandre Rafalovitch
You can certainly specify all your aliases in the request. The request
handler is just there to simplify the client by allowing it to specify
a different URL with everything else mapped on the server. And, of
course, with request handler you can lock the parameters to force
them.

Regarding language detection during indexing, there is a module for
that: http://wiki.apache.org/solr/LanguageDetection . Hopefully that
would be sufficient.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, Apr 23, 2013 at 4:45 PM, Timo Schmidt timo-schm...@gmx.net wrote:
 Ok, thanks for this hint i have two further questions to understand it 
 completly.

 Settingup custom request handler makes it easier to avoid all the mapping 
 parameters in the query but it
 would also be possible with one request handler and all mapping in the 
 request arguments right?

 What about indexing, i there also a mechanism like this or should the 
 application deside with target field to use?


 Gesendet: Dienstag, 23. April 2013 um 02:32 Uhr
 Von: Alexandre Rafalovitch arafa...@gmail.com
 An: solr-user@lucene.apache.org
 Betreff: Re: Support of field variants in solr
 To route different languages, you could use different request handlers
 and do different alias mapping. There are two alias mapping:
 On the way in for eDisMax:
 https://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming
 On the way out: 
 https://wiki.apache.org/solr/CommonQueryParameters#Field_alias[https://wiki.apache.org/solr/CommonQueryParameters#Field_alias]

 Between the two, you can make sure that all searches to /searchES map
 'content' field to 'content_es' and for /searchDE map 'content' to
 'content_de'.

 Hope this helps,
 Alex.

 Personal blog: http://blog.outerthoughts.com/[http://blog.outerthoughts.com/]
 LinkedIn: 
 http://www.linkedin.com/in/alexandrerafalovitch[http://www.linkedin.com/in/alexandrerafalovitch]
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working. (Anonymous - via GTD
 book)


 On Mon, Apr 22, 2013 at 2:31 PM, Timo Schmidt timo-schm...@gmx.net wrote:
 Hi together,

 i am timo and work for a solr implementation company. During the last 
 projects we came to know that we need to be able to generate different 
 variants of a document.

 Example 1 (Language):

 To handle all documents in one solr core, we need a field variant for each 
 language.


 content for spanish content

 field name=content type=text_es indexed=true stored=true 
 variant=“es“ /

 content for german content

 field name=content type=text_de indexed=true stored=true 
 variant=“de“ /


 Each of these fields can be configured in the solr schema to act optimal for 
 the specific taget language.

 Example 2 (Stores):

 We have customers who want to sell the same product in different stores for 
 different prices.


 price in frankfurt

 field name=price type=sfloat indexed=true stored=true variant=“fr“ 
 /

 price in paris

 field name=price type=sfloat indexed=true stored=true variant=“pr“ 
 /

 To solve this in an optimal way it would be nice when this works complely 
 transparent inside solr by definig a „variantQuery“

 A select query could look like this:

 select?variantQuery=frqf=price,content

 Additional the following is possible. No variant is present, behavious 
 should be as before, so it should be relevant for all queries.

 The setting variant=“*“ would mean: There can be several wildcard variant 
 defined in a commited document. This makes sence when the data type would be 
 the same for all variants and you will have many variants (like in the price 
 example).

 The same as during query time should be possible during indexing time.

 I know, that we can do somthing like this also with dynamic fields but then 
 we need to resolve the concrete fields during index and querytime on the 
 application level, what is possible but it would be nicer to have a concept 
 like this in solr, also working with facets is easier with this approach 
 when the concrete fieldname does not need to be populated in the application.

 So my questions are:

 What do you think about this approach?
 Is it better to work with dynamic fields? Is it reasonable when you have 200 
 variants or more of a document?
 What needs to be done in solr to have something like this variant attribute 
 for fields?
 Do you have other approaches?


Re: ranking score by fields

2013-04-24 Thread Majirus FANSI
Hi Alex,
Back to your original requirement,  I think you can do the job at the
client side.  As Erik noted,  highlighter component can help. You are right
it marks terms but not search text. But analyzing the search text with the
appropriate analyzer will give you the terms of your text as used by the
highlighter component.
Hope this helps.
Cheers,

Maj


On 24 April 2013 16:02, Каскевич Александр akaskev...@prontosoft.by wrote:

 Highlighter doesn’t help me. It mark terms but not search text.
 F.e. I have doc with field1=apache lucene, field2=apache solr. I
 search apache solr with AND default option. I found this doc with
 highlighted field1=emapache/em lucene. This is bad result for me.

 Look I want to do something like this:

 Search text: apache solr
 RESULT:
 Found in field1
 Doc1
 Doc2
 Doc3
 ...
 Found in field2
 Doc101
 Doc102
 Doc103
 ...

 Search result have two (or more) parts. Each part sorted by other field
 f.e. by field date desc.

 It mean I need make right sort and I need some flag to insert Found in
 field2 text.

 I try
 q=apache solr
 fl=field1, field2, score, val1:$q1, val2:$q2
 defType=dismax qf=field1^1000 field2^1
 q1={!dismax qf=field1 v='apache solr'}
 q2={!dismax qf=field2 v='apache solr'}

 Now I have flags: val10 - found in field1
 But now I have problem with sort: I cant use val1, val2 in sort :(.

 And now my questions:
 1. have I posibility use my custom fields val1, val2 in sort? With
 formula. Or params $q1, $q2?
 2. may be I have posibility set score by formula at qurey-time?
 3. your variant?

 Thanks.
 Alex.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, April 23, 2013 1:53 AM
 To: solr-user@lucene.apache.org
 Subject: Re: ranking score by fields

 You can sometimes use the highlighter component to do this, but it's a
 little tricky...

 But note your syntax isn't doing what you expect.
 (field1:apache solr) parses as field1:apache defaultfield:solr. You want
 field1:(apache solr)

 debug=all is your friend for these kinds of things, especially the parsed
 query section

 Best
 Erick

 On Mon, Apr 22, 2013 at 4:44 AM, Каскевич Александр 
 akaskev...@prontosoft.by wrote:
  Hi.
  I want to make subject but don't know exactly how can I do it.
  Example.
  I have index with field1, field2, field3.
  I make a query like:
  (field1:apache solr) OR (field2:apache solr) OR (field3:apache solr)
  And I want to know: is it found this doc by field1 or by field2 or by
 field3?
 
  I try to make like this: (field1:apache solr)^100 OR (field2:apache
  solr)^10 OR (field3:apache solr)^1 But the problem is that I don't know
 range, minimum and maximum value of score for each field.
  With other types of similarities (BM25 or othres) same situation.
  I cant find information about this in manual.
 
  Else, I try to use Relevance Functions, f.e. termfreq but it work only
 with terms, not with phrases, like apache solr.
 
  May be I miss something or you have other idea to do this?
  And else, I am not a java programmer and best way for me don't  write
 any plugins for solr.
 
  Thanks.
  Alex.



Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Mark Miller

On Apr 24, 2013, at 4:02 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 Lucidworks Solr Guide says that:
 
 If you are using Sun's JVM, add the -server command-line option when you
 start Solr. This tells the JVM that it should optimize for a long running,
 server process. If the Java runtime on your system is a JRE, rather than a
 full JDK distribution (including javac and other development tools), then
 it is possible that it may not support the -server JVM option
 
 Does any folks using -server parameter? Also what parameters you are using
 to start up Solr? I mean parallel garbage collector vs.?

Unless you are using 32-bit Windows, you are probably getting the server JVM. 
It's not a bad idea to use -server to be sure - it's certainly preferable to 
-client for Solr.

You should generally use the concurrent low pause garbage collector with Solr. 

- Mark



Re: Solr - WordDelimiterFactory with Custom Tokenizer to split only on Boundires

2013-04-24 Thread Jack Krupansky
The WDF types will treat a character the same regardless of where it 
appears.


For something conditional, like dot between letters vs. dot lot preceded and 
followed by a letter, you either have to have a custom tokenizer or a 
character filter.


Interesting that although the standard tokenizer messes up embedded hyphens, 
it does handle the embedded dot vs. trailing dot case as you wish (but 
messes up U.S.A. by stripping the trailing dot) - but that doesn't help 
your case.


A character filter like the following might help your case:

fieldType name=text_ws_dot class=solr.TextField 
positionIncrementGap=100

 analyzer
   charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=([\w\d])[\._amp;]+($|[^\w\d]) replacement=$1 $2 /
   charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=(^|[^\w\d])[\._amp;]+($|[^\w\d]) replacement=$1 $2 /
   charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=(^|[^\w\d])[\._amp;]+([\w\d]) replacement=$1 $2 /

   tokenizer class=solr.WhitespaceTokenizerFactory/
 /analyzer
/fieldType

I'm not a regular expression expert, so I'm not sure whether/how those 
patterns could be combined.


Also, that doesn't allow the case of a single ., , or _ as a word - 
but you didn't specify how that case should be handled.




-- Jack Krupansky
-Original Message- 
From: meghana

Sent: Wednesday, April 24, 2013 6:49 AM
To: solr-user@lucene.apache.org
Subject: Solr - WordDelimiterFactory with Custom Tokenizer to split only on 
Boundires


I have configured WordDelimiterFilterFactory for custom tokenizers for ''
and '-' , and for few tokenizer (like . _ :) we need to split on boundries
only.

e.g.
test.com (should tokenized to test.com)
newyear.  (should tokenized to newyear)
new_car (should tokenized to new_car)
..
..

Below is defination for text field

fieldType name=text_general_preserved class=solr.TextField
positionIncrementGap=100
 analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false /
filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange =0
   splitOnNumerics =0
   stemEnglishPossessive =0
   generateWordParts=1
   generateNumberParts=1
   catenateWords=0
   catenateNumbers=0
   catenateAll=0
   preserveOriginal=0
   protected=protwords_general.txt
   types=wdfftypes_general.txt
   /

   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=false /
   filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange =0
   splitOnNumerics =0
   stemEnglishPossessive =0
   generateWordParts=1
   generateNumberParts=1
   catenateWords=0
   catenateNumbers=0
   catenateAll=0
   preserveOriginal=0
   protected=protwords_general.txt
   types=wdfftypes_general.txt
   /
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
   /fieldType

below is wdfftypes_general.txt content

 = ALPHA
- = ALPHA
_ = SUBWORD_DELIM
: = SUBWORD_DELIM
. = SUBWORD_DELIM

types can be used in worddelimiter  are LOWER, UPPER, ALPHA, DIGIT,
ALPHANUM, SUBWORD_DELIM . there's no description available for use of each
type. as per name, i thought type SUBWORD_DELIM may fulfill my need, but it
doesn't seem to work.

Can anybody suggest me how can i set configuration for worddelimiter factory
to fulfill my requirement.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-WordDelimiterFactory-with-Custom-Tokenizer-to-split-only-on-Boundires-tp4058557.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr faceted search UI

2013-04-24 Thread richa
Thank you very much for your suggestion. 
This is only for PoC. As you suggested about blacklight, can I run this on
windows and to build PoC do I have to have ruby on rails knowledge?

Irrespective of any technology and considering the fact that in past I had
worked on java, j2ee what would you suggest or how would you have proceeded
for this?

Blacklight seems to be a good option, not sure without prior knowledge of
ruby on rails, will I be able to present in short period of time? any
suggestion on this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Noob question: why doesn't this query work?

2013-04-24 Thread Shawn Heisey
On 4/24/2013 8:59 AM, Brian Hurt wrote:
 So, I'm executing the following query:
 id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND (NOT
 id:6178ZwWj5m OR numfields:[* TO 6114] OR d_4:false OR NOT
 i_4:6142E=m)
 
 It's machine generated, which explains the redundancies.  The problem is
 that the query returns no results- but there is a document that should
 match- it has an id of 6178dB=@Fm, an i_0 field of 613OFS, an i_3 field
 of 6111, a numfields of 611A, a d_4 of true (but this shouldn't
 matter), and an i_4 of 6142F1S.
 
 The problem seems to be with the negations.  I did try to replace the NOT's
 with -'s, so, for example, NOT id:6178ZwWj5m would become
 -id:6178ZwWj5m, and this didn't seem to work.
 
 Help?  What's wrong with the query?  Thanks.

It looks like you might have meant to negate all of the query clauses
inside the last set of parentheses.  That's not what your actual query
says. If you change your negation so that the NOT is outside the
parentheses, so that it reads AND NOT (... OR ...), that should fix
that part of it.

If the boolean layout you have is really what you want, then you need to
change the negation queries to (*:* -query) instead, because pure
negative queries are not supported.  That syntax says all documents
except those that match the query.  For simple negation queries, Solr
can figure out that it needs to add the *:* internally, but this query
is more complex.

A few other possible problems:

A backslash is a special character used to escape other special
characters, so you *might* need two of them - one to say 'the next
character is literal' and one to actually be the backslash.  If you
follow the advice in the next paragraph, I can guarantee this will be
the case.  For that reason, you might want to keep the quotes on fields
that might contain characters that have special meaning to the Solr
query parser.

Don't use quotes unless you really are after phrase queries or you can't
escape special characters.  You might actually need phrase queries for
some of this, but I would try simple one-field queries without the
quotes to see whether you need them.  I have no idea what happens if you
include quotes inside a range query (the 6114), but it might not do
what you expect.  I would definitely remove the quotes from that part of
the query.

Thanks,
Shawn



Re: Solr faceted search UI

2013-04-24 Thread Alexandre Rafalovitch
I tried previous version of blacklight (on a Mac) and was able to get
it to the demo stage without much RoR knowledge. The facet field
declarations were all in the config files. You should be able to get a
go/nogo decision in under four hours.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 11:23 AM, richa striketheg...@gmail.com wrote:
 Thank you very much for your suggestion.
 This is only for PoC. As you suggested about blacklight, can I run this on
 windows and to build PoC do I have to have ruby on rails knowledge?

 Irrespective of any technology and considering the fact that in past I had
 worked on java, j2ee what would you suggest or how would you have proceeded
 for this?

 Blacklight seems to be a good option, not sure without prior knowledge of
 ruby on rails, will I be able to present in short period of time? any
 suggestion on this?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058617.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Timothy Potter
Just verifying that it is also recommended to use the JVM options to
kill on OOM? I vaguely recall a message from Mark about this sometime
ago:

-XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError

On Wed, Apr 24, 2013 at 9:13 AM, Mark Miller markrmil...@gmail.com wrote:

 On Apr 24, 2013, at 4:02 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 Lucidworks Solr Guide says that:

 If you are using Sun's JVM, add the -server command-line option when you
 start Solr. This tells the JVM that it should optimize for a long running,
 server process. If the Java runtime on your system is a JRE, rather than a
 full JDK distribution (including javac and other development tools), then
 it is possible that it may not support the -server JVM option

 Does any folks using -server parameter? Also what parameters you are using
 to start up Solr? I mean parallel garbage collector vs.?

 Unless you are using 32-bit Windows, you are probably getting the server JVM. 
 It's not a bad idea to use -server to be sure - it's certainly preferable to 
 -client for Solr.

 You should generally use the concurrent low pause garbage collector with Solr.

 - Mark



Re: Solr 3.6.1: changing a field from stored to not stored

2013-04-24 Thread Jan Høydahl
 I would create a new core as slave of the existing configuration without
 replicating the core schema and configuration. This way I can get the

This won't work, as master/slave replication copies the index files as-is.

You should re-index all your data. You don't need to take down the cluster
to do that, just re-index on top of what's there already, and your index
will become smaller and smaller as merging kicks out the old data :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

24. apr. 2013 kl. 15:59 skrev Majirus FANSI majirus@gmail.com:

 I would create a new core as slave of the existing configuration without
 replicating the core schema and configuration. This way I can get the
 information from one index to the other while saving the space as fields in
 the new schema are mainly not stored. After the replication I would swap
 the cores for the online core to point to the right index dir and conf.
 i.e. the one with less stored fields.
 
 Maj
 
 
 On 24 April 2013 01:48, Petersen, Robert
 robert.peter...@mail.rakuten.comwrote:
 
 Hey I just want to verify one thing before I start doing this:  function
 queries only require fields to be indexed but don't require them to be
 stored right?
 
 -Original Message-
 From: Petersen, Robert [mailto:robert.peter...@mail.rakuten.com]
 Sent: Tuesday, April 23, 2013 4:39 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr 3.6.1: changing a field from stored to not stored
 
 Good info, Thanks Hoss!  I was going to add a more specific fl= parameter
 to my queries at the same time.  Currently I am doing fl=*,score so that
 will have to be changed.
 
 
 -Original Message-
 From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
 Sent: Tuesday, April 23, 2013 4:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 3.6.1: changing a field from stored to not stored
 
 
 : index?  I noticed I am unnecessarily storing some fields in my index and
 : I'd like to stop storing them without having to 'reindex the world' and
 : let the changes just naturally percolate into my index as updates come
 : in the normal course of things.  Do you guys think I could get away with
 : this?
 
 Yes, you can easily get away with this type of change w/o re-indexing,
 however you won't gain any immediate index size savings until each and
 every existing doc has been reindexed and the old copies expunged from the
 index via segment merges.
 
 the one hicup thta can affect people when doing this is what happens if
 you use something like fl=* (and likely hl=* as well) ... many places
 in Solr will try to avoid failure if a stored field is found in the index
 which isn't defined in the schema, and treat that stored value as a string
 (legacy behavior designed to make it easier for people to point Solr at old
 lucene indexes built w/o using Solr) ... so if these stored values are not
 strings, you might get some weird data in your response for these documents.
 
 
 -Hoss
 
 
 
 
 



Re: Solr faceted search UI

2013-04-24 Thread richa
Hi Maj,

Thanks for your suggestion.
Tell me one thing, do you have any example on solrj? suppose I decide to
use solrj in simple web application, to display faceted search on web page.
Where will this fit into? what will be the flow?

Please suggest.

Thanks


On Wed, Apr 24, 2013 at 11:01 AM, Majirus FANSI [via Lucene] 
ml-node+s472066n4058610...@n3.nabble.com wrote:

 Hi richa,
 You can use solrJ (
 http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr)
 to query your solr index.
 On the wiki page indicated, you will see example of faceted search using
 solrJ.
 2009 article by Yonik available on
 searchhubhttp://searchhub.org/2009/09/02/faceted-search-with-solr/
 is
 a good tutorial on faceted search.
 Whether you go for MVC framework or not is up to you. It is recommend
 tough
 to develop search engine application in a Service Oriented Architecture.
 Regards,

 Maj


 On 24 April 2013 16:43, richa [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4058610i=0
 wrote:

  Hi,
  I am working on a POC, where I have to display faceted search result on
 web
  page. can anybody please help me to suggest what all set up I need to
  configure to display. I would prefer java technologies. Just to mention,
 I
  have solr cloud running on remote server.
  I would like to know:
  1. Should I use MVC framework?
  2. How will my local interact with remote solr server?
  3. How will I send query through java code and what technology I should
 use
  to display faceted search result?
 
  Please help me on this.
 
  Thanks,
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058610.html
  To unsubscribe from Solr faceted search UI, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4058598code=c3RyaWtldGhlZ29hbEBnbWFpbC5jb218NDA1ODU5OHwxNzIzOTAyMzYx
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-faceted-search-UI-tp4058598p4058619.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Mark Miller
Yes, I recommend this. You can't predict a JVM that has had an OOM - so it's 
best to neutralize it. We have seen cases where the node was messed up but 
still advertised as active and good in zk due to OOM's. Behavior after an OOM 
is undefined.

I was actually going to ask if you were positive you had restarted that node in 
the other OOM thread, because that sounded similar. Just a straw to grasp for, 
as I'd guess you are sure you did restart it.

- Mark

On Apr 24, 2013, at 11:37 AM, Timothy Potter thelabd...@gmail.com wrote:

 Just verifying that it is also recommended to use the JVM options to
 kill on OOM? I vaguely recall a message from Mark about this sometime
 ago:
 
 -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError
 
 On Wed, Apr 24, 2013 at 9:13 AM, Mark Miller markrmil...@gmail.com wrote:
 
 On Apr 24, 2013, at 4:02 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 
 Lucidworks Solr Guide says that:
 
 If you are using Sun's JVM, add the -server command-line option when you
 start Solr. This tells the JVM that it should optimize for a long running,
 server process. If the Java runtime on your system is a JRE, rather than a
 full JDK distribution (including javac and other development tools), then
 it is possible that it may not support the -server JVM option
 
 Does any folks using -server parameter? Also what parameters you are using
 to start up Solr? I mean parallel garbage collector vs.?
 
 Unless you are using 32-bit Windows, you are probably getting the server 
 JVM. It's not a bad idea to use -server to be sure - it's certainly 
 preferable to -client for Solr.
 
 You should generally use the concurrent low pause garbage collector with 
 Solr.
 
 - Mark
 



Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Mark Miller

On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:

 -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError

The way I like to handle this is to have the OOM trigger a little script or set 
of cmds that logs the issue and kills the process.

Then if you have the process supervised (via runit or something), it will just 
start back up (what else do you do after an OOM?), but you will have logged 
something, triggered a notification, whatever.

- Mark

Solr indeing Partially working

2013-04-24 Thread vishal gupta
Hi i am using Solr 4.2.0 and extension 2.8.2  with Typo3. Whever I try to do
indexing pages and news pages It gets only 3.29% indexed. I checked a
developer log and found error in solrservice.php. And in solr admin it is
giving Dups is not defined please add it. What should i do in this case?
If possible please send me the settings of schema.xml and solrconfig.xml .i
am new to typo3 and solr both.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indeing-Partially-working-tp4058623.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Timothy Potter
I like the idea of running a script vs. kill -9 ;-) Right now when a
node fails, we have monitors for whether a node is up and serving
queries. If not, that triggers some manual investigation and restart
process. Part of the process was to capture the logs and heap dump
file. What happened previously is that the log capture part wasn't
scripted into the restart process and so the logs got wiped out when
the restart happened :-(

One question about this - when you say logs the issue from your
script - what type of things do you log? I've been relying on the
timestamp of the heap dump (hprof) as a way to trace back into our log
files.

Thanks.
Tim

On Wed, Apr 24, 2013 at 10:03 AM, Mark Miller markrmil...@gmail.com wrote:

 On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:

 -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError

 The way I like to handle this is to have the OOM trigger a little script or 
 set of cmds that logs the issue and kills the process.

 Then if you have the process supervised (via runit or something), it will 
 just start back up (what else do you do after an OOM?), but you will have 
 logged something, triggered a notification, whatever.

 - Mark


Deletes and inserts

2013-04-24 Thread Jon Strayer
We are using a Solr collection to serve auto complete suggestions.  We'd
like for the update to be without any noticeable delay for the users.

I've been looking at adding new cores, loading them with the new data and
then swapping them with the current ones, but but I don't see how that
would work in a cloud installation.  It seems that when I create a new core
it is part of the collection and the old data will start replicating to it.
 Is that correct?

I've also looked at standing up a new collection and then adding an alias
for it, but that's not well documented.  If the alias already exists and I
add to to another collection is it removed from the first collection?

I'm open to any suggestions.

-- 
To *know* is one thing, and to know for certain *that* we know is another.
--William James


Re: JVM Parameters to Startup Solr?

2013-04-24 Thread Mark Miller

On Apr 24, 2013, at 12:22 PM, Timothy Potter thelabd...@gmail.com wrote:

 I like the idea of running a script vs. kill -9 ;-) Right now when a
 node fails, we have monitors for whether a node is up and serving
 queries. If not, that triggers some manual investigation and restart
 process. Part of the process was to capture the logs and heap dump
 file. What happened previously is that the log capture part wasn't
 scripted into the restart process and so the logs got wiped out when
 the restart happened :-(
 
 One question about this - when you say logs the issue from your
 script - what type of things do you log? I've been relying on the
 timestamp of the heap dump (hprof) as a way to trace back into our log
 files.

Yeah, that's pretty much it - the time of the event and the fact that an OOM 
occurred. If you are dropping a heap dump, that has the same info, but a log is 
just a nice compact little history of events.

- Mark

 
 Thanks.
 Tim
 
 On Wed, Apr 24, 2013 at 10:03 AM, Mark Miller markrmil...@gmail.com wrote:
 
 On Apr 24, 2013, at 12:00 PM, Mark Miller markrmil...@gmail.com wrote:
 
 -XX:OnOutOfMemoryError=kill -9 %p -XX:+HeapDumpOnOutOfMemoryError
 
 The way I like to handle this is to have the OOM trigger a little script or 
 set of cmds that logs the issue and kills the process.
 
 Then if you have the process supervised (via runit or something), it will 
 just start back up (what else do you do after an OOM?), but you will have 
 logged something, triggered a notification, whatever.
 
 - Mark



Re: How to let Solr load libs from within my JAR?

2013-04-24 Thread Michael Della Bitta
If you want to pack JARs inside JARs, you can use something that does
classloader magic like One-JAR, but it's usually good to avoid things
like that unless you really need them. Alternatively, you could look
at something that unpacks jars and reassembles them into a new JAR,
like the Maven Assembly or Shade plugins.

But usually moving a few extra JARs isn't too difficult.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Apr 23, 2013 at 9:37 PM, Xi Shen davidshe...@gmail.com wrote:
 Hi,

 I developed a data import handler, it has some dependent libraries. I
 deployed them in a parallel folder with my JAR and included the path in
 solrconfig.xml. It works fine. But I am thinking maybe I can pack those JAR
 libs within my JAR, but I got NoClassDefFoundError exception when executing
 my DIH.

 Is it possible Solr can load JAR libs packed in my JAR? How can I do that.


 --
 Regards,
 David Shen

 http://about.me/davidshen
 https://twitter.com/#!/davidshen84


Re: Deletes and inserts

2013-04-24 Thread Michael Della Bitta
We're using aliases to control visibility of collections we rebuild
from scratch nightly. It works pretty well. If you run CREATEALIAS
again, it'll switch to a new one, not augment the old one.

If for some reason, you want to bridge more than one collection, you
can add more than one collection to the alias at creation time, but
then it becomes read-only.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 24, 2013 at 12:26 PM, Jon Strayer j...@strayer.org wrote:
 We are using a Solr collection to serve auto complete suggestions.  We'd
 like for the update to be without any noticeable delay for the users.

 I've been looking at adding new cores, loading them with the new data and
 then swapping them with the current ones, but but I don't see how that
 would work in a cloud installation.  It seems that when I create a new core
 it is part of the collection and the old data will start replicating to it.
  Is that correct?

 I've also looked at standing up a new collection and then adding an alias
 for it, but that's not well documented.  If the alias already exists and I
 add to to another collection is it removed from the first collection?

 I'm open to any suggestions.

 --
 To *know* is one thing, and to know for certain *that* we know is another.
 --William James


SOLR Install

2013-04-24 Thread Peri Subrahmanya
I m trying to use solr as part of another maven based web application. I m
not sure how to wire the two war files. Any help please? I found this
documentation in SOLR but unsure how to go about it.
 
!-- If you are wiring Solr into a larger web application which controls
 the web context root, you will probably want to mount Solr under
 a path prefix (app.war with /app/solr mounted into it, for
example).
 You will need to put this prefix in front of the
SolrDispatchFilter
 url-pattern mapping too (/solr/*), and also on any paths for
 legacy Solr servlet mappings you may be using.
 For the Admin UI to work properly in a path-prefixed
configuration,
 the admin folder containing the resources needs to be under the
app context root
 named to match the path-prefix.  For example:

.war
   xxx
 js
   main.js
--
!--
init-param
  param-namepath-prefix/param-name
  param-value/xxx/param-value
/init-param
--


Thank you,
Peri Subrahmanya




On 4/24/13 12:52 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:

solrservice.php and the text of that error both sound like parts of
Typo3... they're definitely not part of Solr. You should ask on a list
devoted to Typo3 to figure out what to do in this situation. It likely
won't involve reconfiguring Solr.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn¹t a Game


On Wed, Apr 24, 2013 at 11:53 AM, vishal gupta vishalgup...@yahoo.co.in
wrote:
 Hi i am using Solr 4.2.0 and extension 2.8.2  with Typo3. Whever I try
to do
 indexing pages and news pages It gets only 3.29% indexed. I checked a
 developer log and found error in solrservice.php. And in solr admin it
is
 giving Dups is not defined please add it. What should i do in this
case?
 If possible please send me the settings of schema.xml and
solrconfig.xml .i
 am new to typo3 and solr both.



 --
 View this message in context:
http://lucene.472066.n3.nabble.com/Solr-indeing-Partially-working-tp40586
23.html
 Sent from the Solr - User mailing list archive at Nabble.com.








*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.




Re: Solr indeing Partially working

2013-04-24 Thread Gora Mohanty
On 24 April 2013 22:22, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 solrservice.php and the text of that error both sound like parts of
 Typo3... they're definitely not part of Solr. You should ask on a list
 devoted to Typo3 to figure out what to do in this situation. It likely
 won't involve reconfiguring Solr.

You would definitely have better luck asking on a TYPO3
list. Also, I would check the version of Solr supported by
the extension: 4.2.0 is pretty new, and might not be supported.

Regards,
Gora


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-24 Thread SandeepM

One of our main concerns is the solr returns the best match based on what it
thinks is the best.  It uses Levenshtein's distance metrics to determine the
best suggestions.   Can we tune this to put more weightage on the number of
frequency/hits vs the number of edits ?   If we can tune this, suggestions
would seem more relevant when corrected.Also, if we can do this while
keeping maxCollation = 1 and maxCollationTries = some reasonable number so
that QTime does not go out of control that will be great!   

Any insights into this would be great. Thanks for your help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-24 Thread Dyer, James
When getting collations there are two steps. 

First, the spellchecker gets individual word choices for each misspelled word.  
By default, these are sorted by string distance first, then document frequency 
second.  You can override this by specifying str 
name=comparatorClassfreq/str in your spellchecker component configuration 
in solrconfig.xml.  The example provided in the distribution has a 
commented-out section explaining this.

In the second step, one correction is taken off each list and checked against 
the index to see if it is a valid collation.  By valid, it needs to return at 
least 1 hit.  The order in which words combinations are tried is dictated by 
the first step.  Once it runs out of tries, runs out of suggestions, or has 
enough valid collations, it stops.  You cannot configure this to try a bunch 
and sort by # hits or anything like that.  You would have to specify a large # 
of collations to be returned and do this in your application.  But this can run 
the risk of a high qtimes.

So you can sort by frequency, but not by hits.  Sorting by hits would mean 
trying a lot of collations and that is probably too expensive.

One caveat is that sorting by frequency could result in far afield results 
being returned to the user.  You might find that lower-frequency, 
smaller-edit-distance suggestions are going to give the user what they want 
more than higher-edit-distance, higher-frequency suggestions.  Just because a 
word is very common doesn't mean it is the right word.  This is why distance 
is the default and not freq.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Wednesday, April 24, 2013 12:13 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.


One of our main concerns is the solr returns the best match based on what it
thinks is the best.  It uses Levenshtein's distance metrics to determine the
best suggestions.   Can we tune this to put more weightage on the number of
frequency/hits vs the number of edits ?   If we can tune this, suggestions
would seem more relevant when corrected.Also, if we can do this while
keeping maxCollation = 1 and maxCollationTries = some reasonable number so
that QTime does not go out of control that will be great!   

Any insights into this would be great. Thanks for your help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058655.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Noob question: why doesn't this query work?

2013-04-24 Thread Brian Hurt
Thanks for your reponse.  You've given me some solid leads.


On Wed, Apr 24, 2013 at 11:25 AM, Shawn Heisey s...@elyograg.org wrote:

 On 4/24/2013 8:59 AM, Brian Hurt wrote:
  So, I'm executing the following query:
  id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND
 (NOT
  id:6178ZwWj5m OR numfields:[* TO 6114] OR d_4:false OR NOT
  i_4:6142E=m)
 
  It's machine generated, which explains the redundancies.  The problem is
  that the query returns no results- but there is a document that should
  match- it has an id of 6178dB=@Fm, an i_0 field of 613OFS, an i_3
 field
  of 6111, a numfields of 611A, a d_4 of true (but this shouldn't
  matter), and an i_4 of 6142F1S.
 
  The problem seems to be with the negations.  I did try to replace the
 NOT's
  with -'s, so, for example, NOT id:6178ZwWj5m would become
  -id:6178ZwWj5m, and this didn't seem to work.
 
  Help?  What's wrong with the query?  Thanks.

 It looks like you might have meant to negate all of the query clauses
 inside the last set of parentheses.  That's not what your actual query
 says. If you change your negation so that the NOT is outside the
 parentheses, so that it reads AND NOT (... OR ...), that should fix
 that part of it.


No, I meant the NOT to only bind to the next id.  So the query I wanted was:

id:6178dB=@Fm AND i_0:613OFS AND (i_3:6111 OR i_3:1yyy\~) AND ((NOT
id:6178ZwWj5m) OR numfields:[* TO 6114] OR d_4:false OR (NOT
i_4:6142E=m))



 If the boolean layout you have is really what you want, then you need to
 change the negation queries to (*:* -query) instead, because pure
 negative queries are not supported.  That syntax says all documents
 except those that match the query.  For simple negation queries, Solr
 can figure out that it needs to add the *:* internally, but this query
 is more complex.


This could be the problem.  This is query is machine generated, so I don't
care how ugly it is.  Does this apply even to inner queries?  I.e., should
that last clause be (*:* -i_4:6142E=m) instead of (NOT I-4:6142E=m)?


 A few other possible problems:

 A backslash is a special character used to escape other special
 characters, so you *might* need two of them - one to say 'the next
 character is literal' and one to actually be the backslash.  If you
 follow the advice in the next paragraph, I can guarantee this will be
 the case.  For that reason, you might want to keep the quotes on fields
 that might contain characters that have special meaning to the Solr
 query parser.


I wash all strings through ClientUtils.escapeQueryChars always, so this
isn't a problem.  That string should just be 1yyy~, the ~ was getting
escaped.


 Don't use quotes unless you really are after phrase queries or you can't
 escape special characters.  You might actually need phrase queries for
 some of this, but I would try simple one-field queries without the
 quotes to see whether you need them.  I have no idea what happens if you
 include quotes inside a range query (the 6114), but it might not do
 what you expect.  I would definitely remove the quotes from that part of
 the query.


This is another solid possibility, although it might raise some
difficulties for me- I need to be able to support literal string
comparisons, so I'm not sure how well this would support the query s_7 =
some string with spaces sorts of queries.  But some experimentation here
is definitely in order.


 Thanks,
 Shawn




Re: Noob question: why doesn't this query work?

2013-04-24 Thread Chris Hostetter

: This could be the problem.  This is query is machine generated, so I don't
: care how ugly it is.  Does this apply even to inner queries?  I.e., should
: that last clause be (*:* -i_4:6142E=m) instead of (NOT I-4:6142E=m)?

yes -- you can't exclude 6142E=m w/o defining what set (ie: the set 
of all documents: *:*) you are excluding it from.

Related reading about building up nested queries with parens and using the 
AND/OR/NOT syntax...

http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/

-Hoss


Re: solr.StopFilterFactory doesn't work with wildcard

2013-04-24 Thread Chris Hostetter

: In any case, technically, the stop filter is doing exactly what it is supposed
: to do.

Jack has kind of glossed over some key questions here...

1) why are you using StopFilterFactory in your multiterm analyzer like 
this?
2) what do you expect it to do if series is in your stopwords and 
someone queries for series*

: fieldType name=search_string class=solr.TextField
: positionIncrementGap=100
: analyzer type=query
: tokenizer class=solr.WhitespaceTokenizerFactory /
: filter class=solr.StopFilterFactory words=stopwords.txt
: ignoreCase=true/
: /analyzer
: analyzer type=multiterm
: tokenizer class=solr.WhitespaceTokenizerFactory /
: filter class=solr.StopFilterFactory words=stopwords.txt
: ignoreCase=true/
: /analyzer
: /fieldType


-Hoss


Re: Noob question: why doesn't this query work?

2013-04-24 Thread Shawn Heisey

On 4/24/2013 12:13 PM, Brian Hurt wrote:

If the boolean layout you have is really what you want, then you need to
change the negation queries to (*:* -query) instead, because pure
negative queries are not supported.  That syntax says all documents
except those that match the query.  For simple negation queries, Solr
can figure out that it needs to add the *:* internally, but this query
is more complex.


This could be the problem.  This is query is machine generated, so I don't
care how ugly it is.  Does this apply even to inner queries?  I.e., should
that last clause be (*:* -i_4:6142E=m) instead of (NOT I-4:6142E=m)?


Exactly right.


I wash all strings through ClientUtils.escapeQueryChars always, so this
isn't a problem.  That string should just be 1yyy~, the ~ was getting
escaped.


A quick check with debugQuery seems to confirm my thoughts on this - if 
you have the quotes, the escaping isn't necessary, although including it 
appears to be working correctly too.  Depending on exactly what field 
type you have, you might be good there.



Don't use quotes unless you really are after phrase queries or you can't
escape special characters.  You might actually need phrase queries for
some of this, but I would try simple one-field queries without the
quotes to see whether you need them.  I have no idea what happens if you
include quotes inside a range query (the 6114), but it might not do
what you expect.  I would definitely remove the quotes from that part of
the query.


This is another solid possibility, although it might raise some
difficulties for me- I need to be able to support literal string
comparisons, so I'm not sure how well this would support the query s_7 =
some string with spaces sorts of queries.  But some experimentation here
is definitely in order.


Due to the query parser trying to be smart, quotes appear to be 
necessary if spaces are part of your indexed values and your query.


Since I now know that you don't want to negate the range query, it makes 
sense for me to tell you that a value of 611A is outside the range [* TO 
6114], because numbers are lower than letters when doing string 
comparisons.  This was why I thought you might be trying to negate the 
entire query clause - it's the only way that particular piece would match.


Thanks,
Shawn



Re: Update on shards

2013-04-24 Thread Mark Miller
Sorry - need to correct myself - updates worked the same as read requests - 
they also needed to hit a SolrCore in order to get forwarded to the right node. 
I was not thinking clearly when I said this applied to just reads and not 
writes. Both needed a SolrCore to do their work - with the request proxying, 
this is no longer the case, so you can hit Solr instances with no SolrCores or 
with SolrCores that are not part of the collection you are working with, and 
both read and write side requests are now proxied to a suitable node that has a 
SolrCore that can do the search or forward the update (or accept the update).

- Mark

On Apr 23, 2013, at 3:38 PM, Mark Miller markrmil...@gmail.com wrote:

 We have a 3rd release candidate for 4.3 being voted on now.
 
 I have never tested this feature with Tomcat - only Jetty. Users have 
 reported it does not work with Tomcat. That leads one to think it may have a 
 problem in other containers as well.
 
 A previous contributor donated a patch that explicitly flushes a stream in 
 our proxy code - he says this allows the feature to work with Tomcat. I 
 committed this feature - the flush can't hurt, and given the previous 
 contributions of this individual, I'm fairly confident the fix makes things 
 work in Tomcat. I have no first hand knowledge that it does work though.
 
 You might take the RC for a spin and test it our yourself: 
 http://people.apache.org/~simonw/staging_area/lucene-solr-4.3.0-RC3-rev1470846/
 
 - Mark
 
 On Apr 23, 2013, at 3:20 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 
 Hi Mark;
 
 All in all you say that when 4.3 is tagged at repository (I mean when it is
 ready) this feature will work for Tomcat too at a stable version?
 
 
 2013/4/23 Mark Miller markrmil...@gmail.com
 
 
 On Apr 23, 2013, at 2:49 PM, Shawn Heisey s...@elyograg.org wrote:
 
 What exactly is the 'request proxying' thing that doesn't work on
 tomcat?  Is this something different from basic SolrCloud operation where
 you send any kind of request to any server and they get directed where they
 need to go? I haven't heard of that not working on tomcat before.
 
 Before 4.2, if you made a read request to a node that didn't contain part
 of the collection you where searching, it would return 404. Write requests
 would be forwarded to where they belong no matter what node you sent them
 to, but read requests required that node have a part of the collection you
 were accessing.
 
 In 4.2 we added request proxying for this read side case. If a piece of
 the collection you are querying is not found on the node you hit, a simple
 proxy of the request is done to a node that does contain a piece of the
 collection.
 
 - Mark
 



How do set compression for compression on stored fields in SOLR 4.2.1

2013-04-24 Thread William Bell
https://issues.apache.org/jira/browse/LUCENE-4226
It mentions that we can set compression mode:
FAST, HIGH_COMPRESSION, FAST_UNCOMPRESSION.


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SOLR 4.3

2013-04-24 Thread William Bell
OK I did not see it on the latest 4.3 RC3.


On Wed, Apr 24, 2013 at 4:52 AM, Jan Høydahl jan@cominvent.com wrote:

 As you can see on the issue, it is already fixed for 4.3

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 24. apr. 2013 kl. 07:02 skrev William Bell billnb...@gmail.com:

  Can we get this in please to 4.3?
 
  https://issues.apache.org/jira/browse/SOLR-4746
 
 
  --
  Bill Bell
  billnb...@gmail.com
  cell 720-256-8076




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


full-import takes 4 days(48 hours) to complete where main db table size 700k only

2013-04-24 Thread srinalluri
Hi,

Environment is Solr 3.6.1. The database is having enough indexes. The box is
having enough memory. The DB is performance is good. Auto commit is enabled
for every 1 minute. 
Please see the following entity. The full-import of this entity is taking
over 48 hours to complete on production environment. The number records in
the main table is around 700,000 only.  I tried materialized view, but that
view is having duplicate records. So I can't go with materialized view for
all these queries. 
Can someone please suggest how to improve the performance for full-import?

entity name=oracle-article dataSource=oracle pk=VCMID
preImportDeleteQuery=content_type:article AND repository:oracleqa
query=select ID as VCMID from tab_story2 order by published_date desc
deltaImportQuery=select '${dataimporter.delta.VCMID}' as VCMID from dual
deltaQuery=select s2.ID as VCMID from tab_story2 s2, gnasmomap mms2,
gnasmometadata mmd where s2.id = mms2.keystring1 and mms2.recordid =
mmd.contentmgmtid and mmd.lastpublishdate  ((CAST(SYS_EXTRACT_UTC(TIMESTAMP
'${dataimporter.oracle-article.last_index_time}') AS DATE) -
TO_DATE('01/01/1970 00:00:00', 'MM-DD- HH24:MI:SS')) * 24 * 60 * 60 *
1000)-30
entity name=recordid dataSource=oracle
transformer=TemplateTransformer query=select RECORDID from gnasmomap
where keystring1 = '${oracle-article.VCMID}'
field column=content_type template=article/
field column=RECORDID name=native_id/
field column=repository template=oracleqa/
/entity
entity name=quot;article_detailsquot; dataSource=quot;oraclequot;
transformer=quot;ClobTransformer,RegexTransformer,script:trimTicker,script:hasBody,script:hasDeckquot;
query=quot;select STORY_TITLE, STORY_HEADLINE, SOURCE, DECK,
regexp_replace(body, '\lt;p\\[(pullquote|summary)\]\/p\|\[video
[0-9]+?\]|\[youtube .+?\]', '') as BODY, PUBLISHED_DATE, MODIFIED_DATE,
DATELINE, REPORTER_NAME, TICKER_CODES,ADVERTORIAL_CONTENT from tab_story2
where id = '${oracle-article.VCMID}'
field column=STORY_TITLE name=title/
field column=DECK name=description clob=true/
field column=PUBLISHED_DATE name=date/
field column=MODIFIED_DATE name=last_modified_date/
field column=BODY name=body clob=true/
field column=SOURCE name=source/
field column=DATELINE name=dateline/
field column=STORY_HEADLINE name=export_headline/
field column=ticker splitBy=, sourceColName=TICKER_CODES/
field column=ADVERTORIAL_CONTENT name=advertorial_content/
field column=has_body sourceColName=body/
field column=has_description sourceColName=description/
/entity
entity name=site dataSource=oracle query=select CASE WHEN
site.name='fq2' THEN 'fqn' WHEN site.name='fq' THEN 'sbc' WHEN
site.name='fq-lat' THEN 'latino' ELSE 'gc' END SITE, CASE WHEN
site.name='fq2' THEN 'v8-qa.tabbusiness.com' WHEN site.name='fb' THEN
'v8-qa.smallbusiness.tabbusiness.com' WHEN site.name='qn-latino' THEN
'v8-qa.latino.tabdays.com' ELSE 'v8-qa.tabdays.com' END SERVER from
gnasmomap mm, gnaschannelfileassociation cfa, gnaschannel ch, gnassite site
where mm.keystring1 = '${oracle-article.VCMID}' and mm.recordid =
cfa.vcmobjectid and cfa.channelid = ch.id and ch.siteid = site.id and rownum
= 1
field column=SITE name=site/
entity name=url dataSource=oracle query=select 'http://' ||
'${site.SERVER}' || furl as URL from tab_furl where parent_id =
'${oracle-article.VCMID}'
field column=URL name=url/
/entity
entity name=image dataSource=oracle transformer=script:hasImageURL
query=select distinct('http://qa.global.fqstatic.com' || sourcepath) as
IMAGE_URL from ( select mc.sourcepath from tab_rel_content rc, tab_story2
st, gnasmomap mm, dsx_media_common mc where rc.parent_id =
'${oracle-article.VCMID}' and rc.parent_id = st.id and (st.NO_FEATURED_MEDIA
!= 'yes' OR st.NO_FEATURED_MEDIA is null) and rc.ref_id = mm.recordid and
mm.keystring1 = mc.mediaid and rc.rank = 1 union all select mc.sourcepath
from tab_rel_content arm, tab_story2 st, gnasmomap cmm, tab_rel_media crm,
gnasmomap mmm, dsx_media_common mc where arm.parent_id =
'${oracle-article.VCMID}' and arm.parent_id = st.id and
(st.NO_FEATURED_MEDIA !='yes' OR st.NO_FEATURED_MEDIA is null) and
arm.ref_id = cmm.recordid and cmm.keystring1 = crm.parent_id and crm.rank =
1 and crm.ref_id = mmm.recordid and mmm.keystring1 = mc.mediaid and arm.rank
= 1)
field column=IMAGE_URL name=image_url/
field column=has_image_url sourceColName=IMAGE_URL/
/entity
/entity
entity name=taxonomy dataSource=oracle query=select tc.PATH from
gnasmomap mm, gndaassociation ass, gndataxonomycategory tc where mm.recordid
= ass.cmsobjectid and ass.categoryid = tc.id and mm.keystring1 =
'${oracle-article.VCMID}'
field column=PATH name=taxonomy_path/
/entity
entity name=keyword dataSource=oracle
transformer=RegexTransformer,script:trimKeyword query=select KEYWORDS
from tab_rel_metadata where parent_id = '${oracle-article.VCMID}'
field column=keyword splitBy=, sourceColName=KEYWORDS/
/entity
entity name=author dataSource=oracle query=select pmm.recordid as
author_id, trim(trim(trailing ',' from 

Re: full-import takes 4 days(48 hours) to complete where main db table size 700k only

2013-04-24 Thread Alexandre Rafalovitch
1) You may have a small primary table but for each ID in it, you seem
to be calling another 6 tables with nested SQL queries. Perhaps you
need to cache those calls:
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

2) You seem to be double-dipping into the main table tab_story2 in the
nested entity, perhaps there is a way to avoid that
3) You are sorting the main table in the outside query. Why? You are
going to process every record anyway.
4) Auto-commit is probably way too expensive here. Try setting it to
every 2 minutes without changing anything else and see how many more
entities you process in the same X minutes. In Solr 4+, there are
better options for commit.

Regards,
  Alex

On Wed, Apr 24, 2013 at 3:25 PM, srinalluri nallurisr...@yahoo.com wrote:
 Hi,

 Environment is Solr 3.6.1. The database is having enough indexes. The box is
 having enough memory. The DB is performance is good. Auto commit is enabled
 for every 1 minute.
 Please see the following entity. The full-import of this entity is taking
 over 48 hours to complete on production environment. The number records in
 the main table is around 700,000 only.  I tried materialized view, but that
 view is having duplicate records. So I can't go with materialized view for
 all these queries.
 Can someone please suggest how to improve the performance for full-import?

 entity name=oracle-article dataSource=oracle pk=VCMID
 preImportDeleteQuery=content_type:article AND repository:oracleqa



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


RE: Error while starting Solr on Websphere

2013-04-24 Thread Van Tassell, Kristian
I'm having this same issue (versions and all). Was there ever a response to 
this question? I can't seem to find one. Thanks in advance!









From divz80 divya.god...@gmail.com

Subject Error while starting Solr on Websphere

Date  Wed, 20 Mar 2013 23:13:07 GMT



Hi,



i'm attempting to setup Solr 4.2.0 on IBM Websphere 8.5. I've deployed the 
solr.war and when I try to access the admin page, I get this error.



Error 503: Server is shutting down



The log files has this error:



[3/20/13 18:56:33:564 EDT] 0061 HttpClientUti I 
org.apache.solr.client.solrj.impl.HttpClientUtil createClient Creating new http 
client, 
config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false

[3/20/13 18:56:33:592 EDT] 0061 SolrDispatchF E 
org.apache.solr.servlet.SolrDispatchFilter init Could not start Solr. Check 
solr/home property and the logs

[3/20/13 18:56:33:639 EDT] 0061 SolrCore  E 
org.apache.solr.common.SolrException log null:java.lang.NoSuchMethodError: 
org/apache/http/conn/scheme/Scheme.init(Ljava/lang/String;ILorg/apache/http/conn/scheme/SchemeSocketFactory;)V



I verified that the latest jar, httpclient-4.2.3.jar is in the lib folder, 
there are no older versions of the jar.

 is there any other configuration step I'm missing or a property I need to set?


Re: full-import takes 4 days(48 hours) to complete where main db table size 700k only

2013-04-24 Thread Michael Della Bitta
How long do those query take to execute and return all it's rows
outside of DataImportHandler?

I'd bring those queries into SQL Developer and get an explain plan on
them to find out if any of them are much slower than the other.

You might have only 700k documents for your index, but you're
issuing a separate query for every entity for every document. Multiply
that number of queries times the average round trip latency to your
database and that's the amount of time your app server and database
server spend sitting around doing nothing, waiting for messages to
arrive. If you can remove any of those entities in favor of joins,
you'll be doing yourself a favor.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Apr 24, 2013 at 3:25 PM, srinalluri nallurisr...@yahoo.com wrote:
 Hi,

 Environment is Solr 3.6.1. The database is having enough indexes. The box is
 having enough memory. The DB is performance is good. Auto commit is enabled
 for every 1 minute.
 Please see the following entity. The full-import of this entity is taking
 over 48 hours to complete on production environment. The number records in
 the main table is around 700,000 only.  I tried materialized view, but that
 view is having duplicate records. So I can't go with materialized view for
 all these queries.
 Can someone please suggest how to improve the performance for full-import?

 entity name=oracle-article dataSource=oracle pk=VCMID
 preImportDeleteQuery=content_type:article AND repository:oracleqa
 query=select ID as VCMID from tab_story2 order by published_date desc
 deltaImportQuery=select '${dataimporter.delta.VCMID}' as VCMID from dual
 deltaQuery=select s2.ID as VCMID from tab_story2 s2, gnasmomap mms2,
 gnasmometadata mmd where s2.id = mms2.keystring1 and mms2.recordid =
 mmd.contentmgmtid and mmd.lastpublishdate  ((CAST(SYS_EXTRACT_UTC(TIMESTAMP
 '${dataimporter.oracle-article.last_index_time}') AS DATE) -
 TO_DATE('01/01/1970 00:00:00', 'MM-DD- HH24:MI:SS')) * 24 * 60 * 60 *
 1000)-30
 entity name=recordid dataSource=oracle
 transformer=TemplateTransformer query=select RECORDID from gnasmomap
 where keystring1 = '${oracle-article.VCMID}'
 field column=content_type template=article/
 field column=RECORDID name=native_id/
 field column=repository template=oracleqa/
 /entity
 entity name=article_details dataSource=oracle
 transformer=ClobTransformer,RegexTransformer,script:trimTicker,script:hasBody,script:hasDeck
 query=select STORY_TITLE, STORY_HEADLINE, SOURCE, DECK,
 regexp_replace(body, '\p\\[(pullquote|summary)\]\/p\|\[video
 [0-9]+?\]|\[youtube .+?\]', '') as BODY, PUBLISHED_DATE, MODIFIED_DATE,
 DATELINE, REPORTER_NAME, TICKER_CODES,ADVERTORIAL_CONTENT from tab_story2
 where id = '${oracle-article.VCMID}'
 field column=STORY_TITLE name=title/
 field column=DECK name=description clob=true/
 field column=PUBLISHED_DATE name=date/
 field column=MODIFIED_DATE name=last_modified_date/
 field column=BODY name=body clob=true/
 field column=SOURCE name=source/
 field column=DATELINE name=dateline/
 field column=STORY_HEADLINE name=export_headline/
 field column=ticker splitBy=, sourceColName=TICKER_CODES/
 field column=ADVERTORIAL_CONTENT name=advertorial_content/
 field column=has_body sourceColName=body/
 field column=has_description sourceColName=description/
 /entity
 entity name=site dataSource=oracle query=select CASE WHEN
 site.name='fq2' THEN 'fqn' WHEN site.name='fq' THEN 'sbc' WHEN
 site.name='fq-lat' THEN 'latino' ELSE 'gc' END SITE, CASE WHEN
 site.name='fq2' THEN 'v8-qa.tabbusiness.com' WHEN site.name='fb' THEN
 'v8-qa.smallbusiness.tabbusiness.com' WHEN site.name='qn-latino' THEN
 'v8-qa.latino.tabdays.com' ELSE 'v8-qa.tabdays.com' END SERVER from
 gnasmomap mm, gnaschannelfileassociation cfa, gnaschannel ch, gnassite site
 where mm.keystring1 = '${oracle-article.VCMID}' and mm.recordid =
 cfa.vcmobjectid and cfa.channelid = ch.id and ch.siteid = site.id and rownum
 = 1
 field column=SITE name=site/
 entity name=url dataSource=oracle query=select 'http://' ||
 '${site.SERVER}' || furl as URL from tab_furl where parent_id =
 '${oracle-article.VCMID}'
 field column=URL name=url/
 /entity
 entity name=image dataSource=oracle transformer=script:hasImageURL
 query=select distinct('http://qa.global.fqstatic.com' || sourcepath) as
 IMAGE_URL from ( select mc.sourcepath from tab_rel_content rc, tab_story2
 st, gnasmomap mm, dsx_media_common mc where rc.parent_id =
 '${oracle-article.VCMID}' and rc.parent_id = st.id and (st.NO_FEATURED_MEDIA
 != 'yes' OR st.NO_FEATURED_MEDIA is null) and rc.ref_id = mm.recordid and
 mm.keystring1 = mc.mediaid and rc.rank = 1 union all select mc.sourcepath
 from tab_rel_content arm, tab_story2 st, gnasmomap cmm, tab_rel_media crm,
 gnasmomap mmm, dsx_media_common mc where arm.parent_id =
 '${oracle-article.VCMID}' and arm.parent_id = st.id and
 

Re: Error while starting Solr on Websphere

2013-04-24 Thread Gora Mohanty
On 25 April 2013 01:42, Van Tassell, Kristian
kristian.vantass...@siemens.com wrote:
 I'm having this same issue (versions and all). Was there ever a response to 
 this question? I can't seem to find one. Thanks in advance!
[...]

As the error message says, my first guess would be that solr/home
is not set  properly. Please see:
http://wiki.apache.org/solr/SolrInstall#Setup and also
http://wiki.apache.org/solr/SolrWebSphere

You might also want to first try to get Solr working with
the embedded Jetty as that is the most straightforward
way to get started, and is fine for production use also.

Regards,
Gora


RE: Error while starting Solr on Websphere

2013-04-24 Thread divz80
I never got it to work on Websphere 8.5. We are using Websphere 7 in
production, so I deployed the same app (no changes) and it worked on
Websphere 7.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-while-starting-Solr-on-Websphere-tp4049583p4058707.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Error while starting Solr on Websphere

2013-04-24 Thread Van Tassell, Kristian
Thanks for the reply.

I have setup Solr on Jetty, Tomcat, JBoss and WebLogic (we have to be able to 
deploy to multiple server types for our doc). On this particular machine, I've 
set up WebLogic as well with the same Solr home (although WebLogic is stopped 
at the moment so they don't compete over the same index). 

For the WebSphere instance, I have the Solr home defined by the startup script 
(defined by JAVA_OPT, essentially, as -Dsolr.solr.home=D:/solr - which is very 
similar to how WebLogic, JBoss and Tomcat are set up). 

Anyways, perhaps I'll set up a separate solr home entirely from the WebLogic 
instance (just to be sure).

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Wednesday, April 24, 2013 3:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Error while starting Solr on Websphere

On 25 April 2013 01:42, Van Tassell, Kristian kristian.vantass...@siemens.com 
wrote:
 I'm having this same issue (versions and all). Was there ever a response to 
 this question? I can't seem to find one. Thanks in advance!
[...]

As the error message says, my first guess would be that solr/home is not set  
properly. Please see:
http://wiki.apache.org/solr/SolrInstall#Setup and also 
http://wiki.apache.org/solr/SolrWebSphere

You might also want to first try to get Solr working with the embedded Jetty as 
that is the most straightforward way to get started, and is fine for production 
use also.

Regards,
Gora


Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
I have added that fields:

field name=text type=text_general indexed=true stored=true/
dynamicField name=attr_* type=text_general indexed=true
stored=true multiValued=true/
dynamicField name=ignored_* type=ignored/

and I have that definition:

fieldtype name=ignored stored=false indexed=false multiValued=true
class=solr.StrField /

here is my error:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status400/int
int name=QTime4154/int
/lst
lst name=error
str name=msgERROR: [doc=1] unknown field 'ignored_meta'/str
int name=code400/int
/lst
/response

What should I do more?

2013/4/24 Erik Hatcher erik.hatc...@gmail.com

 Also, at Solr startup time it logs what it loads from those lib
 elements, so you can see whether it is loading the files you intend to or
 not.

 Erik

 On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote:

  Have you tried using absolute path to the relevant urls? That will
  cleanly split the problem into 'still not working' and 'wrong relative
  path'.
 
  Regards,
Alex.
  On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  lib dir=../../../contrib/extraction/lib regex=.*\.jar /
   lib dir=../../../dist/ regex=solr-cell-\d.*\.jar /
 
 
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)




filter before facet

2013-04-24 Thread Daniel Tyreus
We're testing SolrCloud 4.1 for NRT search over hundreds of millions of
documents. I've been really impressed. The query performance is so much
better than we were getting out of our database.

With filter queries, we're able to get query times of less than 100ms under
moderate load. That's amazing.

My question today is on faceting. Let me give some examples to help make my
point.

*fq=state:California*
numFound = 92193
QTime = *80*

*fq=state:Calforni*
numFound = 0
QTime = *8*

*fq=state:Californiafacet=truefacet.field=city*
numFound = 92193
QTime = *1316*

*fq=city:San Franciscofacet=truefacet.field=city*
numFound = 1961
QTime = *1477*

*fq=state:Californifacet=truefacet.field=city*
numFound = 0
QTime = *1380*

So filtering is fast and faceting is slow, which is understandable.

But why is it slow to generate facets on a result set of 0? Furthermore,
why does it take the same amount of time to generate facets on a result set
of 2000 as 100,000 documents?

This leads me to believe that the FQ is being applied AFTER the facets are
calculated on the whole data set. For my use case it would make a ton of
sense to apply the FQ first and then facet. Is it possible to specify this
behavior or do I need to get into the code and get my hands dirty?

Best Regards,
Daniel


Re: Indexing PDF Files

2013-04-24 Thread Alexandre Rafalovitch
Wrong case for fieldType ? Though I would have through Solr would
complaint about that when it hits dynamicField with unknown type.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 4:59 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 I have added that fields:

 field name=text type=text_general indexed=true stored=true/
 dynamicField name=attr_* type=text_general indexed=true
 stored=true multiValued=true/
 dynamicField name=ignored_* type=ignored/

 and I have that definition:

 fieldtype name=ignored stored=false indexed=false multiValued=true
 class=solr.StrField /

 here is my error:

 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
 int name=status400/int
 int name=QTime4154/int
 /lst
 lst name=error
 str name=msgERROR: [doc=1] unknown field 'ignored_meta'/str
 int name=code400/int
 /lst
 /response

 What should I do more?

 2013/4/24 Erik Hatcher erik.hatc...@gmail.com

 Also, at Solr startup time it logs what it loads from those lib
 elements, so you can see whether it is loading the files you intend to or
 not.

 Erik

 On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote:

  Have you tried using absolute path to the relevant urls? That will
  cleanly split the problem into 'still not working' and 'wrong relative
  path'.
 
  Regards,
Alex.
  On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  lib dir=../../../contrib/extraction/lib regex=.*\.jar /
   lib dir=../../../dist/ regex=solr-cell-\d.*\.jar /
 
 
 
  Personal blog: http://blog.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)




Re: Indexing PDF Files

2013-04-24 Thread Erik Hatcher
Did you restart after adding those fields and types?

On Apr 24, 2013, at 16:59, Furkan KAMACI furkankam...@gmail.com wrote:

 I have added that fields:
 
 field name=text type=text_general indexed=true stored=true/
 dynamicField name=attr_* type=text_general indexed=true
 stored=true multiValued=true/
 dynamicField name=ignored_* type=ignored/
 
 and I have that definition:
 
 fieldtype name=ignored stored=false indexed=false multiValued=true
 class=solr.StrField /
 
 here is my error:
 
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeader
 int name=status400/int
 int name=QTime4154/int
 /lst
 lst name=error
 str name=msgERROR: [doc=1] unknown field 'ignored_meta'/str
 int name=code400/int
 /lst
 /response
 
 What should I do more?
 
 2013/4/24 Erik Hatcher erik.hatc...@gmail.com
 
 Also, at Solr startup time it logs what it loads from those lib
 elements, so you can see whether it is loading the files you intend to or
 not.
 
Erik
 
 On Apr 24, 2013, at 10:05 , Alexandre Rafalovitch wrote:
 
 Have you tried using absolute path to the relevant urls? That will
 cleanly split the problem into 'still not working' and 'wrong relative
 path'.
 
 Regards,
  Alex.
 On Wed, Apr 24, 2013 at 9:02 AM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 lib dir=../../../contrib/extraction/lib regex=.*\.jar /
 lib dir=../../../dist/ regex=solr-cell-\d.*\.jar /
 
 
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 


Re: filter before facet

2013-04-24 Thread Alexandre Rafalovitch
What's your facet.method? Have you tried setting it both ways?
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 5:10 PM, Daniel Tyreus dan...@webshots.com wrote:
 We're testing SolrCloud 4.1 for NRT search over hundreds of millions of
 documents. I've been really impressed. The query performance is so much
 better than we were getting out of our database.

 With filter queries, we're able to get query times of less than 100ms under
 moderate load. That's amazing.

 My question today is on faceting. Let me give some examples to help make my
 point.

 *fq=state:California*
 numFound = 92193
 QTime = *80*

 *fq=state:Calforni*
 numFound = 0
 QTime = *8*

 *fq=state:Californiafacet=truefacet.field=city*
 numFound = 92193
 QTime = *1316*

 *fq=city:San Franciscofacet=truefacet.field=city*
 numFound = 1961
 QTime = *1477*

 *fq=state:Californifacet=truefacet.field=city*
 numFound = 0
 QTime = *1380*

 So filtering is fast and faceting is slow, which is understandable.

 But why is it slow to generate facets on a result set of 0? Furthermore,
 why does it take the same amount of time to generate facets on a result set
 of 2000 as 100,000 documents?

 This leads me to believe that the FQ is being applied AFTER the facets are
 calculated on the whole data set. For my use case it would make a ton of
 sense to apply the FQ first and then facet. Is it possible to specify this
 behavior or do I need to get into the code and get my hands dirty?

 Best Regards,
 Daniel


***Immediate requirement for Java Solr search consultant at Bothell, WA***

2013-04-24 Thread dwayne
Hello Professionals,

This is DWAYNE from KRG Technologies; KRG is headquartered in Valencia, CA –
Incorporated in 2003, currently have over 200 consultants. We are
specialized in providing Staffing Services Solutions in Americas. We are a
Tier1 vendor in providing Professional Services on diversified IT Skills for
many customers across the country.  

We are looking for a *Java Solr search consultant* for the below mentioned
job description.  Kindly forward me your Consultant’s resume, rate and
contact details for further process. 
  
I also kindly request you to forward this opportunity to your friends or
colleagues; so that we can help someone who may be in search of a job or
looking for a change


Location - Bothell, WA
Duration – 6+ Months

Job Description:

1.  Experience of Apache SOLR Search is required. Other Search product such
as endeca search engine can also be considered.
2.  Should be familiar with search concepts, solutions and terminologies.
3.  Should posses very good knowledge of
Java/J2EE/JSP/JQuery/Ajax/JS/XML/XSL/JSON/HTML technologies.
4.  Prior experience in Agile Methodologies, Webtrends Reporting, J2EE 
Design
patterns will be an advantage.

Thanks  Regards,
Dwayne
25000 Avenue Stanford, # 243
Valencia, CA 91355
Direct Phone: (661) 310 1677 | Fax: (661) 257-9968 
Email: dwa...@krgtech.com | URL: www.krgtech.com




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Immediate-requirement-for-Java-Solr-search-consultant-at-Bothell-WA-tp4058711.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr consultant recommendation

2013-04-24 Thread Chris Hostetter

: Subject: Solr consultant recommendation
: In-Reply-To: e8a79384-5570-4777-b90c-c59c89cf4...@cominvent.com

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: Too many unique terms

2013-04-24 Thread Manuel LeNormand
Hey Erick, thanks for the interesting reply.
Indexing unicode characters is not a problem i see, nor indexing mails. I'm
alraight with defining useless a word that is unique through all my index.

I will try reindexing strategy you proposed, though, as you said,  having a
few millions of stop words will not be an easy task to maintain. More to it
i will reduce the memory chunks that get saved to the RAM, as most of it is
trash.
As my problem seems to be very specific I think i'll turn to the code to
check how can i do it on my own. Hope this adventure will go well.
Cheers,
Manu


On Wed, Apr 24, 2013 at 2:10 PM, Erick Erickson erickerick...@gmail.comwrote:

 Even if you could know ahead of time, 7M stop words is a
 lot to maintain. But assuming that your index is really
 pretty static, you could consider building it once, then
 creating the stopword file from unique terms and re-indexing.

 You could consider cleaning them on the input side or
 creating a custom filter that, say, checked against a dictionary
 (that you'd have to find).

 There's nothing that I know of that'll allow you to delete
 unique terms from a static index.

 About a regex, you could use PatternReplaceCharFilterFactory
 to remove them from your input stream, but the trick is defining
 useless. Part numbers are really useful in some situations
 for instance. There's nothing standard because there's no
 standard. You haven't, for instance, provided any criteria for
 what useless is. Do you care about e-mails? What about
 accents? Unicode? The list gets pretty endless.

 You should be able to write a regex that removes
 everything non-alpha-numeric or some such for instance,
 although even that is a problem if you're indexing anything but
 plain-vanilla English. The Java pre-defined '\w', for instance,
 refers to [a-zA-Z_0-9]. Nary an accented character in sight.


 Best
 Erick

 On Tue, Apr 23, 2013 at 3:53 PM, Manuel Le Normand
 manuel.lenorm...@gmail.com wrote:
  Hi there,
  Looking at one of my shards (about 1M docs) i see lot of unique terms,
 more
  than 8M which is a significant part of my total term count. These are
 very
  likely useless terms, binaries or other meaningless numbers that come
 with
  few of my docs.
  I am totally fine with deleting them so these terms would be
 unsearchable.
  Thinking about it i get that
  1. It is impossible apriori knowing if it is unique term or not, so i
  cannot add them to my stop words.
  2. I have a performance decrease cause my cached chuncks do contain
 useless
  data, and im short on memory.
 
  Assuming a constant index, is there a way of deleting all terms that are
  unique from at least the dictionary tim and tip files? Will i get
  significant query time performance increase? Does any body know a class
 of
  regex that identify meaningless terms that i can add to my
 updateProcessor?
 
  Thanks
  Manu



how to get display Jessionid with solr results

2013-04-24 Thread gpssolr2020
Hi,

We are using jetty as a container for solr 3.6. We have two slave servers to
serve queries for the user request and queries distributed to any one slave
through load balancer.

When one user send a first search request say its going to slave1 and when 
that user queries again we want to send the query to the same server with
the help of Jsessionid.

how to achieve this? How to get that Jsessionid with solr search results?
Please provide your suggestions.

Thanks.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-get-display-Jessionid-with-solr-results-tp4058751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
I just want to search on rich documents but I still get same error. I have
copied example folder into anywhere else at my computer. I have copied dist
and contrib folders from my build folder into that copy of example folder
(because solr-cell etc. are within that folders) However I still get same
error. If any of you could help me you are welcome. Here is my schema:


?xml version=1.0 encoding=UTF-8 ?
!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the License); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--
!--
Description: This document contains Solr 4.x schema definition to
be used with Solr integration currently build into Nutch.
This schema is not minimal, there are some useful field type definitions
left,
and the set of fields and their flags (indexed/stored/term vectors) can be
further optimized depending on needs. See
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup
for more info.
--

schema name=nutch version=1.5

types

!-- The StrField type is not analyzed, but indexed/stored verbatim. --
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/


!--
Default numeric field types. For faster range queries, consider the
tint/tfloat/tlong/tdouble types.
--
fieldType name=int class=solr.TrieIntField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0
omitNorms=true positionIncrementGap=0/

!--
Numeric field types that index each value at various levels of precision
to accelerate range queries when the number of values between the range
endpoints is large. See the javadoc for NumericRangeQuery for internal
implementation details.

Smaller precisionStep values (specified in bits) will lead to more tokens
indexed per value, slightly larger index size, and faster range queries.
A precisionStep of 0 disables indexing at different precision levels.
--
fieldType name=tint class=solr.TrieIntField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
omitNorms=true positionIncrementGap=0/

!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
is a more restricted form of the canonical representation of dateTime
http://www.w3.org/TR/xmlschema-2/#dateTime
The trailing Z designates UTC time and is mandatory.
Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
All other components are mandatory.

Expressions can also be used to denote calculations that should be
performed relative to NOW to determine the value, ie...

NOW/HOUR
... Round to the start of the current hour
NOW-1DAY
... Exactly 1 day prior to now
NOW/DAY+6MONTHS+3DAYS
... 6 months and 3 days in the future from the start of
the current day

Consult the DateField javadocs for more information.

Note: For faster range queries, consider the tdate type
--
fieldType name=date class=solr.TrieDateField omitNorms=true
precisionStep=0 positionIncrementGap=0/

!-- A Trie based date field for faster date range queries and date
faceting. --
fieldType name=tdate class=solr.TrieDateField omitNorms=true
precisionStep=6 positionIncrementGap=0/


!-- solr.TextField allows the specification of custom text analyzers
specified as a tokenizer and a list of token filters. Different
analyzers may be specified for indexing and querying.

The optional positionIncrementGap puts space between multiple fields of
this type on the same document, with the purpose of preventing false phrase
matching across fields.

For more info on customizing your analyzer chain, please see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
--

!-- A general text field that has reasonable, generic
cross-language defaults: it tokenizes with StandardTokenizer,
removes stop words from case-insensitive stopwords.txt
(empty by default), and down cases. At query time only, it
also applies synonyms. --
fieldType name=text_general 

Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
Here is my definition for handler:

requestHandler name=/update/extract class=solr.extraction.
ExtractingRequestHandler 
lst name=defaults
str name=fmap.contenttext/str
str name=lowernamestrue/str
str name=uprefixattr_/str
str name=captureAttrtrue/str
/lst
/requestHandler




2013/4/25 Furkan KAMACI furkankam...@gmail.com

 I just want to search on rich documents but I still get same error. I have
 copied example folder into anywhere else at my computer. I have copied dist
 and contrib folders from my build folder into that copy of example folder
 (because solr-cell etc. are within that folders) However I still get same
 error. If any of you could help me you are welcome. Here is my schema:


 ?xml version=1.0 encoding=UTF-8 ?
 !--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements. See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 --
 !--
 Description: This document contains Solr 4.x schema definition to
 be used with Solr integration currently build into Nutch.
 This schema is not minimal, there are some useful field type definitions
 left,
 and the set of fields and their flags (indexed/stored/term vectors) can be
 further optimized depending on needs. See

 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup
 for more info.
 --

 schema name=nutch version=1.5

 types

 !-- The StrField type is not analyzed, but indexed/stored verbatim. --
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/


 !--
 Default numeric field types. For faster range queries, consider the
 tint/tfloat/tlong/tdouble types.
 --
 fieldType name=int class=solr.TrieIntField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=float class=solr.TrieFloatField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=long class=solr.TrieLongField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=double class=solr.TrieDoubleField precisionStep=0
 omitNorms=true positionIncrementGap=0/

 !--
 Numeric field types that index each value at various levels of precision
 to accelerate range queries when the number of values between the range
 endpoints is large. See the javadoc for NumericRangeQuery for internal
 implementation details.

 Smaller precisionStep values (specified in bits) will lead to more tokens
 indexed per value, slightly larger index size, and faster range queries.
 A precisionStep of 0 disables indexing at different precision levels.
 --
 fieldType name=tint class=solr.TrieIntField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tlong class=solr.TrieLongField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
 omitNorms=true positionIncrementGap=0/

 !-- The format for this date field is of the form 1995-12-31T23:59:59Z,
 and
 is a more restricted form of the canonical representation of dateTime
 http://www.w3.org/TR/xmlschema-2/#dateTime
 The trailing Z designates UTC time and is mandatory.
 Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
 All other components are mandatory.

 Expressions can also be used to denote calculations that should be
 performed relative to NOW to determine the value, ie...

 NOW/HOUR
 ... Round to the start of the current hour
 NOW-1DAY
 ... Exactly 1 day prior to now
 NOW/DAY+6MONTHS+3DAYS
 ... 6 months and 3 days in the future from the start of
 the current day

 Consult the DateField javadocs for more information.

 Note: For faster range queries, consider the tdate type
 --
 fieldType name=date class=solr.TrieDateField omitNorms=true
 precisionStep=0 positionIncrementGap=0/

 !-- A Trie based date field for faster date range queries and date
 faceting. --
 fieldType name=tdate class=solr.TrieDateField omitNorms=true
 precisionStep=6 positionIncrementGap=0/


 !-- solr.TextField allows the specification of custom text analyzers
 specified as a tokenizer and a list of token filters. Different
 analyzers may be specified for indexing and querying.

 The optional positionIncrementGap puts space between multiple fields of
 this type on the same document, with the purpose of preventing false phrase
 matching across 

Fwd: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.

2013-04-24 Thread Furkan KAMACI
When I try to configure schema.xml within example folder and start jar file
I get that error:

org.apache.solr.common.SolrException: copyField dest :'author_s' is not an
explicit field and doesn't match a dynamicField.

There is nothing about it at example schema.xml file?


Solr 4.3: Too late to improve error messages?

2013-04-24 Thread Alexandre Rafalovitch
Hello,

I am testing 4.3rc3. It looks ok, but I notice that some log messages
could be more informative. For example:
680 [coreLoadExecutor-3-thread-3] WARN
org.apache.solr.schema.IndexSchema  – schema has no name!

Would be _very nice_ to know which core this is complaining about.
Later, once the core is loaded, the core name shows up in the logs,
but it would be nice to have it earlier without having to
triangulating it through 'Loading core' messages.

Is that too late for 4.3? I know somebody was looking at logging, so
maybe there is a chance.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: filter before facet

2013-04-24 Thread Daniel Tyreus
I'm actually using one not listed in that doc (I suspect it's new). At
least with 3 or more facet fields, the FCS method is by far the best.

Here are some representative numbers with everything the same except for
the facet.method.

facet.method = fc
QTime = 3168

facet.method = enum
QTime = 309

facet.method = fcs
QTime = 19






On Wed, Apr 24, 2013 at 2:19 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 What's your facet.method? Have you tried setting it both ways?
 http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Apr 24, 2013 at 5:10 PM, Daniel Tyreus dan...@webshots.com
 wrote:
  We're testing SolrCloud 4.1 for NRT search over hundreds of millions of
  documents. I've been really impressed. The query performance is so much
  better than we were getting out of our database.
 
  With filter queries, we're able to get query times of less than 100ms
 under
  moderate load. That's amazing.
 
  My question today is on faceting. Let me give some examples to help make
 my
  point.
 
  *fq=state:California*
  numFound = 92193
  QTime = *80*
 
  *fq=state:Calforni*
  numFound = 0
  QTime = *8*
 
  *fq=state:Californiafacet=truefacet.field=city*
  numFound = 92193
  QTime = *1316*
 
  *fq=city:San Franciscofacet=truefacet.field=city*
  numFound = 1961
  QTime = *1477*
 
  *fq=state:Californifacet=truefacet.field=city*
  numFound = 0
  QTime = *1380*
 
  So filtering is fast and faceting is slow, which is understandable.
 
  But why is it slow to generate facets on a result set of 0? Furthermore,
  why does it take the same amount of time to generate facets on a result
 set
  of 2000 as 100,000 documents?
 
  This leads me to believe that the FQ is being applied AFTER the facets
 are
  calculated on the whole data set. For my use case it would make a ton of
  sense to apply the FQ first and then facet. Is it possible to specify
 this
  behavior or do I need to get into the code and get my hands dirty?
 
  Best Regards,
  Daniel



Re: Indexing PDF Files

2013-04-24 Thread Alexandre Rafalovitch
You still seem to have 'fieldtype' with wrong case. Can you try that
simple thing before doing other complicated steps? And yes, restart
Solr after you change schema.xml

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 Here is my definition for handler:

 requestHandler name=/update/extract class=solr.extraction.
 ExtractingRequestHandler 
 lst name=defaults
 str name=fmap.contenttext/str
 str name=lowernamestrue/str
 str name=uprefixattr_/str
 str name=captureAttrtrue/str
 /lst
 /requestHandler




 2013/4/25 Furkan KAMACI furkankam...@gmail.com

 I just want to search on rich documents but I still get same error. I have
 copied example folder into anywhere else at my computer. I have copied dist
 and contrib folders from my build folder into that copy of example folder
 (because solr-cell etc. are within that folders) However I still get same
 error. If any of you could help me you are welcome. Here is my schema:


 ?xml version=1.0 encoding=UTF-8 ?
 !--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements. See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 --
 !--
 Description: This document contains Solr 4.x schema definition to
 be used with Solr integration currently build into Nutch.
 This schema is not minimal, there are some useful field type definitions
 left,
 and the set of fields and their flags (indexed/stored/term vectors) can be
 further optimized depending on needs. See

 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup
 for more info.
 --

 schema name=nutch version=1.5

 types

 !-- The StrField type is not analyzed, but indexed/stored verbatim. --
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/


 !--
 Default numeric field types. For faster range queries, consider the
 tint/tfloat/tlong/tdouble types.
 --
 fieldType name=int class=solr.TrieIntField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=float class=solr.TrieFloatField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=long class=solr.TrieLongField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=double class=solr.TrieDoubleField precisionStep=0
 omitNorms=true positionIncrementGap=0/

 !--
 Numeric field types that index each value at various levels of precision
 to accelerate range queries when the number of values between the range
 endpoints is large. See the javadoc for NumericRangeQuery for internal
 implementation details.

 Smaller precisionStep values (specified in bits) will lead to more tokens
 indexed per value, slightly larger index size, and faster range queries.
 A precisionStep of 0 disables indexing at different precision levels.
 --
 fieldType name=tint class=solr.TrieIntField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tlong class=solr.TrieLongField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
 omitNorms=true positionIncrementGap=0/

 !-- The format for this date field is of the form 1995-12-31T23:59:59Z,
 and
 is a more restricted form of the canonical representation of dateTime
 http://www.w3.org/TR/xmlschema-2/#dateTime
 The trailing Z designates UTC time and is mandatory.
 Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
 All other components are mandatory.

 Expressions can also be used to denote calculations that should be
 performed relative to NOW to determine the value, ie...

 NOW/HOUR
 ... Round to the start of the current hour
 NOW-1DAY
 ... Exactly 1 day prior to now
 NOW/DAY+6MONTHS+3DAYS
 ... 6 months and 3 days in the future from the start of
 the current day

 Consult the DateField javadocs for more information.

 Note: For faster range queries, consider the tdate type
 --
 fieldType name=date class=solr.TrieDateField omitNorms=true
 precisionStep=0 positionIncrementGap=0/

 !-- A Trie based 

Re: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.

2013-04-24 Thread Alexandre Rafalovitch
You are running 4.2, right?

If you searched mailing list, you would probably find that this is a
regression: https://issues.apache.org/jira/browse/SOLR-4567

Should be fixed in 4.3 (I reported this originally and it works in 4.3rc3).

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 7:02 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 When I try to configure schema.xml within example folder and start jar file
 I get that error:

 org.apache.solr.common.SolrException: copyField dest :'author_s' is not an
 explicit field and doesn't match a dynamicField.

 There is nothing about it at example schema.xml file?


Re: Indexing PDF Files

2013-04-24 Thread Furkan KAMACI
Hi Alex;
What do you mean with wrong case. Could you tell me what should I do?

2013/4/25 Alexandre Rafalovitch arafa...@gmail.com

 You still seem to have 'fieldtype' with wrong case. Can you try that
 simple thing before doing other complicated steps? And yes, restart
 Solr after you change schema.xml

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  Here is my definition for handler:
 
  requestHandler name=/update/extract class=solr.extraction.
  ExtractingRequestHandler 
  lst name=defaults
  str name=fmap.contenttext/str
  str name=lowernamestrue/str
  str name=uprefixattr_/str
  str name=captureAttrtrue/str
  /lst
  /requestHandler
 
 
 
 
  2013/4/25 Furkan KAMACI furkankam...@gmail.com
 
  I just want to search on rich documents but I still get same error. I
 have
  copied example folder into anywhere else at my computer. I have copied
 dist
  and contrib folders from my build folder into that copy of example
 folder
  (because solr-cell etc. are within that folders) However I still get
 same
  error. If any of you could help me you are welcome. Here is my schema:
 
 
  ?xml version=1.0 encoding=UTF-8 ?
  !--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements. See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the License); you may not use this file except in compliance with
  the License. You may obtain a copy of the License at
 
  http://www.apache.org/licenses/LICENSE-2.0
 
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an AS IS BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
  --
  !--
  Description: This document contains Solr 4.x schema definition to
  be used with Solr integration currently build into Nutch.
  This schema is not minimal, there are some useful field type definitions
  left,
  and the set of fields and their flags (indexed/stored/term vectors) can
 be
  further optimized depending on needs. See
 
 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup
  for more info.
  --
 
  schema name=nutch version=1.5
 
  types
 
  !-- The StrField type is not analyzed, but indexed/stored verbatim. --
  fieldType name=string class=solr.StrField sortMissingLast=true
  omitNorms=true/
 
 
  !--
  Default numeric field types. For faster range queries, consider the
  tint/tfloat/tlong/tdouble types.
  --
  fieldType name=int class=solr.TrieIntField precisionStep=0
  omitNorms=true positionIncrementGap=0/
  fieldType name=float class=solr.TrieFloatField precisionStep=0
  omitNorms=true positionIncrementGap=0/
  fieldType name=long class=solr.TrieLongField precisionStep=0
  omitNorms=true positionIncrementGap=0/
  fieldType name=double class=solr.TrieDoubleField precisionStep=0
  omitNorms=true positionIncrementGap=0/
 
  !--
  Numeric field types that index each value at various levels of precision
  to accelerate range queries when the number of values between the range
  endpoints is large. See the javadoc for NumericRangeQuery for internal
  implementation details.
 
  Smaller precisionStep values (specified in bits) will lead to more
 tokens
  indexed per value, slightly larger index size, and faster range queries.
  A precisionStep of 0 disables indexing at different precision levels.
  --
  fieldType name=tint class=solr.TrieIntField precisionStep=8
  omitNorms=true positionIncrementGap=0/
  fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
  omitNorms=true positionIncrementGap=0/
  fieldType name=tlong class=solr.TrieLongField precisionStep=8
  omitNorms=true positionIncrementGap=0/
  fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
  omitNorms=true positionIncrementGap=0/
 
  !-- The format for this date field is of the form 1995-12-31T23:59:59Z,
  and
  is a more restricted form of the canonical representation of dateTime
  http://www.w3.org/TR/xmlschema-2/#dateTime
  The trailing Z designates UTC time and is mandatory.
  Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
  All other components are mandatory.
 
  Expressions can also be used to denote calculations that should be
  performed relative to NOW to determine the value, ie...
 
  NOW/HOUR
  ... Round to the start of the current hour
  NOW-1DAY
  ... Exactly 1 day prior to now
  NOW/DAY+6MONTHS+3DAYS
  ... 6 months and 3 days in the future from the 

Re: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.

2013-04-24 Thread Furkan KAMACI
Yes, I use 4.2.1, thanks.

2013/4/25 Alexandre Rafalovitch arafa...@gmail.com

 You are running 4.2, right?

 If you searched mailing list, you would probably find that this is a
 regression: https://issues.apache.org/jira/browse/SOLR-4567

 Should be fixed in 4.3 (I reported this originally and it works in 4.3rc3).

 Regards,
Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Apr 24, 2013 at 7:02 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
  When I try to configure schema.xml within example folder and start jar
 file
  I get that error:
 
  org.apache.solr.common.SolrException: copyField dest :'author_s' is not
 an
  explicit field and doesn't match a dynamicField.
 
  There is nothing about it at example schema.xml file?



Re: Indexing PDF Files

2013-04-24 Thread Jan Høydahl
In your schema you have written

 fieldtype name=ignored stored=false indexed=false multiValued=true
 class=solr.StrField /

Note that XML tag and param names are case sensitive, so instead of fieldtype 
you should use fieldType

I see that you have the same error for several fieldTypes in your schema, 
probably resulting in other similar errors too.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

25. apr. 2013 kl. 01:10 skrev Furkan KAMACI furkankam...@gmail.com:

 Hi Alex;
 What do you mean with wrong case. Could you tell me what should I do?
 
 2013/4/25 Alexandre Rafalovitch arafa...@gmail.com
 
 You still seem to have 'fieldtype' with wrong case. Can you try that
 simple thing before doing other complicated steps? And yes, restart
 Solr after you change schema.xml
 
 Regards,
   Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Wed, Apr 24, 2013 at 6:50 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:
 Here is my definition for handler:
 
 requestHandler name=/update/extract class=solr.extraction.
 ExtractingRequestHandler 
 lst name=defaults
 str name=fmap.contenttext/str
 str name=lowernamestrue/str
 str name=uprefixattr_/str
 str name=captureAttrtrue/str
 /lst
 /requestHandler
 
 
 
 
 2013/4/25 Furkan KAMACI furkankam...@gmail.com
 
 I just want to search on rich documents but I still get same error. I
 have
 copied example folder into anywhere else at my computer. I have copied
 dist
 and contrib folders from my build folder into that copy of example
 folder
 (because solr-cell etc. are within that folders) However I still get
 same
 error. If any of you could help me you are welcome. Here is my schema:
 
 
 ?xml version=1.0 encoding=UTF-8 ?
 !--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements. See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License. You may obtain a copy of the License at
 
 http://www.apache.org/licenses/LICENSE-2.0
 
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 --
 !--
 Description: This document contains Solr 4.x schema definition to
 be used with Solr integration currently build into Nutch.
 This schema is not minimal, there are some useful field type definitions
 left,
 and the set of fields and their flags (indexed/stored/term vectors) can
 be
 further optimized depending on needs. See
 
 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup
 for more info.
 --
 
 schema name=nutch version=1.5
 
 types
 
 !-- The StrField type is not analyzed, but indexed/stored verbatim. --
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/
 
 
 !--
 Default numeric field types. For faster range queries, consider the
 tint/tfloat/tlong/tdouble types.
 --
 fieldType name=int class=solr.TrieIntField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=float class=solr.TrieFloatField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=long class=solr.TrieLongField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 fieldType name=double class=solr.TrieDoubleField precisionStep=0
 omitNorms=true positionIncrementGap=0/
 
 !--
 Numeric field types that index each value at various levels of precision
 to accelerate range queries when the number of values between the range
 endpoints is large. See the javadoc for NumericRangeQuery for internal
 implementation details.
 
 Smaller precisionStep values (specified in bits) will lead to more
 tokens
 indexed per value, slightly larger index size, and faster range queries.
 A precisionStep of 0 disables indexing at different precision levels.
 --
 fieldType name=tint class=solr.TrieIntField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tlong class=solr.TrieLongField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
 omitNorms=true positionIncrementGap=0/
 
 !-- The format for this date field is of the form 1995-12-31T23:59:59Z,
 and
 is a more restricted form of the canonical representation of dateTime
 http://www.w3.org/TR/xmlschema-2/#dateTime
 

Re: Solr 4.3: Too late to improve error messages?

2013-04-24 Thread Shawn Heisey

On 4/24/2013 5:02 PM, Alexandre Rafalovitch wrote:

I am testing 4.3rc3. It looks ok, but I notice that some log messages
could be more informative. For example:
680 [coreLoadExecutor-3-thread-3] WARN
org.apache.solr.schema.IndexSchema  – schema has no name!

Would be _very nice_ to know which core this is complaining about.
Later, once the core is loaded, the core name shows up in the logs,
but it would be nice to have it earlier without having to
triangulating it through 'Loading core' messages.

Is that too late for 4.3? I know somebody was looking at logging, so
maybe there is a chance.


I haven't been around as long as the guys who make the decisions, but I 
am fairly sure that there won't be a new release candidate for a 
cosmetic issue.


Make sure the issue is filed in Jira so that it can be fixed in 4.4. 
This is something that should definitely be fixed, but from what I have 
seen, only serious bugs will trigger a new RC, and that's only if they 
don't have a viable workaround and they can be fixed quickly.


Thanks,
Shawn



Re: SolrException: copyField dest :'author_s' is not an explicit field and doesn't match a dynamicField.

2013-04-24 Thread Steve Rowe
Alexandre, Furkan reports an error about a copyField *dest* - SOLR-4567 was 
about the copyField *source*, and the fix was included in 4.2.1.

In order for copyFields to work, the dest *must* match a field or dynamicField 
declaration in the schema - otherwise there's no way to know what type the 
destination field is.

Furkan, can you give the parts of your schema that are involved here?  Maybe 
you just need to add a *_s dynamicField with type=string?

Steve

On Apr 24, 2013, at 7:08 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
 You are running 4.2, right?
 
 If you searched mailing list, you would probably find that this is a
 regression: https://issues.apache.org/jira/browse/SOLR-4567
 
 Should be fixed in 4.3 (I reported this originally and it works in 4.3rc3).
 
 Regards,
   Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
 
 
 On Wed, Apr 24, 2013 at 7:02 PM, Furkan KAMACI furkankam...@gmail.com wrote:
 When I try to configure schema.xml within example folder and start jar file
 I get that error:
 
 org.apache.solr.common.SolrException: copyField dest :'author_s' is not an
 explicit field and doesn't match a dynamicField.
 
 There is nothing about it at example schema.xml file?



Re: Pushing a whole set of pdf-files to solr

2013-04-24 Thread sdspieg
I am still struggling with this. I have solr 4.2.1.2013.03.26.08.26.55
installed. So are you telling me that I should somehow install the older
version of that tool that comes with Solr 3.x? Because with the newer
version I get the errors I already mentioned. Now I suppose I may be an
untypical user, as I am running all of this under windows and really just
want to find an easy way to get a whole bunch of files from a local folder
(on my harddrive) into my local version of solr. But so is there really no
easier way of doing this? 

-Stephan 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4058776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.3: Too late to improve error messages?

2013-04-24 Thread Alexandre Rafalovitch
Thanks Shawn,

I will create a JIRA. I just wasn't sure if there was another RC
afterwards and it could fit in there. Not very familiar with process
yet.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Apr 24, 2013 at 7:27 PM, Shawn Heisey s...@elyograg.org wrote:
 On 4/24/2013 5:02 PM, Alexandre Rafalovitch wrote:

 I am testing 4.3rc3. It looks ok, but I notice that some log messages
 could be more informative. For example:
 680 [coreLoadExecutor-3-thread-3] WARN
 org.apache.solr.schema.IndexSchema  – schema has no name!

 Would be _very nice_ to know which core this is complaining about.
 Later, once the core is loaded, the core name shows up in the logs,
 but it would be nice to have it earlier without having to
 triangulating it through 'Loading core' messages.

 Is that too late for 4.3? I know somebody was looking at logging, so
 maybe there is a chance.


 I haven't been around as long as the guys who make the decisions, but I am
 fairly sure that there won't be a new release candidate for a cosmetic
 issue.

 Make sure the issue is filed in Jira so that it can be fixed in 4.4. This is
 something that should definitely be fixed, but from what I have seen, only
 serious bugs will trigger a new RC, and that's only if they don't have a
 viable workaround and they can be fixed quickly.

 Thanks,
 Shawn



Re: Indexing PDF Files

2013-04-24 Thread Jack Krupansky

Does the stock Solr example work for document import?

Here's a sample command that I use:

curl 
http://localhost:8983/solr/update/extract?literal.id=doc1fmap.content=featurescommit=true; 
-F myfile=@myfile.PDF


That works with the stock Solr example, without any changes.

At least get that working before moving on to the challenge of Solr under 
Tomcat.


Note: The text field is not stored, so you can't retrieve the 
content/body of a document from that field.


If the stock Solr example works for you, then you just need to consult 
Tomcat documentation as to how to configure lib/jars for an app.


Also, go into solrconfig and look for:

lib dir=../../../contrib/extraction/lib regex=.*\.jar /
lib dir=../../../dist/ regex=solr-cell-\d.*\.jar /

Those lines work for the stock Solr example because you cd to the example 
directory, but they won't work for Tomcat since the cwd is somewhere else.


My vague recollection is that if you let Tomcat expand the war file, then 
you can go into the directory containing the expanded files and edit/move as 
you want. Like, put the necessary directory names in the above two lines.


When Solr starts, it should display log messages about what directories are 
being used for these lib elements (INFO 
org.apache.solr.core.SolrConfig  – Adding specified lib dirs to 
ClassLoader).


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Wednesday, April 24, 2013 6:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF Files

Here is my definition for handler:

requestHandler name=/update/extract class=solr.extraction.
ExtractingRequestHandler 
lst name=defaults
str name=fmap.contenttext/str
str name=lowernamestrue/str
str name=uprefixattr_/str
str name=captureAttrtrue/str
/lst
/requestHandler




2013/4/25 Furkan KAMACI furkankam...@gmail.com


I just want to search on rich documents but I still get same error. I have
copied example folder into anywhere else at my computer. I have copied 
dist

and contrib folders from my build folder into that copy of example folder
(because solr-cell etc. are within that folders) However I still get same
error. If any of you could help me you are welcome. Here is my schema:


?xml version=1.0 encoding=UTF-8 ?
!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the License); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--
!--
Description: This document contains Solr 4.x schema definition to
be used with Solr integration currently build into Nutch.
This schema is not minimal, there are some useful field type definitions
left,
and the set of fields and their flags (indexed/stored/term vectors) can be
further optimized depending on needs. See

http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/schema.xml?view=markup
for more info.
--

schema name=nutch version=1.5

types

!-- The StrField type is not analyzed, but indexed/stored verbatim. --
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/


!--
Default numeric field types. For faster range queries, consider the
tint/tfloat/tlong/tdouble types.
--
fieldType name=int class=solr.TrieIntField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=float class=solr.TrieFloatField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=long class=solr.TrieLongField precisionStep=0
omitNorms=true positionIncrementGap=0/
fieldType name=double class=solr.TrieDoubleField precisionStep=0
omitNorms=true positionIncrementGap=0/

!--
Numeric field types that index each value at various levels of precision
to accelerate range queries when the number of values between the range
endpoints is large. See the javadoc for NumericRangeQuery for internal
implementation details.

Smaller precisionStep values (specified in bits) will lead to more tokens
indexed per value, slightly larger index size, and faster range queries.
A precisionStep of 0 disables indexing at different precision levels.
--
fieldType name=tint class=solr.TrieIntField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tfloat class=solr.TrieFloatField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tlong class=solr.TrieLongField precisionStep=8
omitNorms=true positionIncrementGap=0/
fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8
omitNorms=true 

Re: Pushing a whole set of pdf-files to solr

2013-04-24 Thread sdspieg
(Just documenting my experiences). I stopped and restarted solr in the tomcat
web application manager. Everything seems fine
http://lucene.472066.n3.nabble.com/file/n4058786/4-25-2013_2-38-43_AM.png 
And yet I still get that same error message. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Pushing-a-whole-set-of-pdf-files-to-solr-tp4025256p4058786.html
Sent from the Solr - User mailing list archive at Nabble.com.


  1   2   >