RE: How can i search maximum number of word in particular docs

2013-09-24 Thread Gupta, Abhinav
Not Sure though ,depends how your schema is configured. If it has WordDelimiter 
Filters at time of indexing or querying then search will not be as you desired.
Check your index creation using solr analysis for this type of string.

Thanks,
Abhinav


-Original Message-
From: Viresh Modi [mailto:viresh.m...@highq.com] 
Sent: 24 September 2013 18:09
To: solr-user@lucene.apache.org
Subject: How can i search maximum number of word in particular docs

Mu Query Looks Like:

start=0&rows=10&hl=true&hl.fl=content&qt=dismax
&q=pookan
&fl=id,application,timestamp,name,score,metaData,metaDataDate
&fq=application:OnlineR3_6_4
&fq=(metaData:channelId/101 OR metaData:channelId/104) &sort=score desc


but not getting result as per desired

 OnlineR3_6_4_101_7
 pookan pookan pookan  
OnlineR3_6_4_101_20
 pookan pookan pookan pookan pookan  
 OnlineR3_6_4_101_19
 pookan pookan pookan pookan  
  OnlineR3_6_4_101_21
 pookan pookan 


Acutually i want particular word for that match max in content tag that come 
first (relevancy based)


Re: Problem loading my codec sometimes

2013-09-24 Thread Shawn Heisey
On 9/24/2013 6:32 PM, Scott Schneider wrote:
> I created my own codec and Solr can find it sometimes and not other times.  
> When I start fresh (delete the data folder and run Solr), it all works fine.  
> I can add data and query it.  When I stop Solr and start it again, I get:
> 
> Caused by: java.lang.IllegalArgumentException: A SPI class of type 
> org.apache.lucene.codecs.Codec with name 'MyCodec' does not exist. You need 
> to add the corresponding JAR file supporting this SPI to your classpath.The 
> current classpath supports the following names: [SimpleText, Appending, 
> Lucene40, Lucene3x, Lucene41, Lucene42]
> 
> I added the JAR to the path and I'm pretty sure Java sees it, or else it 
> would not be using my codec when I start fresh.  (I've looked at the index 
> files and verified that it's using my codec.)  I suppose Solr is asking SPI 
> for my codec based on the codec class name stored in the index files, but I 
> don't see why this would fail when a fresh start works.

What I always recommend for those who want to use custom and contrib
jars is that they put all such jars (and their dependencies) into
${solr.solr.home}/lib, don't use any  directives in solrconfig.xml,
and don't put the sharedLib attribute into solr.xml.  Doing it in any
other way has a tendency to trigger bugs or causes jars to get loaded
more than once.

The ${solr.solr.home} property defaults to $CWD/solr (CWD is current
working directory for those who don't already know) and is the location
of the solr.xml file.  Note that depending on the exact version of Solr
and which servlet container you are using, there may actually be two
solr.xml files, one which loads solr into your container and one that
configures Solr.  I am referring to the latter.

If you are using the solr example and its directory layout, the
directory you would need to put all jars into is example/solr/lib ...
which is a directory that doesn't exist and has to be created.

http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29
http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond

Thanks,
Shawn



Problem loading my codec sometimes

2013-09-24 Thread Scott Schneider
Hello,

I created my own codec and Solr can find it sometimes and not other times.  
When I start fresh (delete the data folder and run Solr), it all works fine.  I 
can add data and query it.  When I stop Solr and start it again, I get:

Caused by: java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.Codec with name 'MyCodec' does not exist. You need to 
add the corresponding JAR file supporting this SPI to your classpath.The 
current classpath supports the following names: [SimpleText, Appending, 
Lucene40, Lucene3x, Lucene41, Lucene42]

I added the JAR to the path and I'm pretty sure Java sees it, or else it would 
not be using my codec when I start fresh.  (I've looked at the index files and 
verified that it's using my codec.)  I suppose Solr is asking SPI for my codec 
based on the codec class name stored in the index files, but I don't see why 
this would fail when a fresh start works.

Any thoughts?

Thanks,
Scott



RE: Querying a non-indexed field?

2013-09-24 Thread Scott Schneider
Ok, thanks for your answers!

Scott


> -Original Message-
> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> Sent: Wednesday, September 18, 2013 5:36 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Querying a non-indexed field?
> 
> Moreover, you may be trying to save/optimize in a wrong place. Maybe
> these
> additional indexed fields are not so costly. Maybe you can optimize in
> some
> other part of your setup.
> 
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Sep 18, 2013 5:47 PM, "Chris Hostetter" 
> wrote:
> 
> >
> > : Subject: Re: Querying a non-indexed field?
> > :
> > : No.  --wunder
> >
> > To elaborate just a bit...
> >
> > : query on a few indexed fields, getting a small # of results.  I
> want to
> > : restrict this further based on values from non-indexed, stored
> fields.
> > : I can obviously do this myself, but it would be nice if Solr could
> do
> >
> > ...you could implement this in a custom SearchComponent, or custom
> qparser
> > that would generate PostFilter compatible queries, that looked at the
> > stored field values -- but it's extremeley unlikeley that you would
> ever
> > convince any of the lucene/solr devs to agree to commit a general
> purpose
> > version of this type of logic into the code base -- because in the
> general
> > case (arbitrary unknown number of documents matching the main query)
> it
> > would be extremely inefficient and would encourage "bad" user
> behavior.
> >
> > -Hoss
> >


Re: B M

2013-09-24 Thread B M


http://namibia4u.greendoor12.com/blinkus.php Am Ende besteht das Leben eines 
Menschen meist nur noch aus einigen wenigen ungestrichenen Passagen.
  

Re: Problem running EmbeddedSolr (spring data)

2013-09-24 Thread Furkan KAMACI
Run maven dependency tree command and you can easily understand the cause
of dependency conflict if not you can send your command line output and we
can help you.

21 Eylül 2013 Cumartesi tarihinde Erick Erickson 
adlı kullanıcı şöyle yazdı:
> bq: Caused by: java.lang.NoSuchMethodError:
>
> This usually means that you have a mixture of old and new jars around
> and have compiled against one and are finding the other one in your
> classpath.
>
> Best,
> Erick
>
> On Fri, Sep 20, 2013 at 9:37 AM, JMill 
wrote:
>> What is the cause of this Stactrace?
>>
>> Working with the following solr maven dependancies
>>
>> 4.4.0> solr-core-version>
>> 1.0.0.RC1
>>
>> Stacktrace
>>
>> SEVERE: Exception sending context initialized event to listener instance
of
>> class org.springframework.web.context.ContextLoaderListener
>> org.springframework.beans.factory.BeanCreationException: Error creating
>> bean with name 'solrServerFactoryBean' defined in class path resource
>> [com/project/core/config/EmbeddedSolrContext.class]: Invocation of init
>> method failed; nested exception is java.lang.NoSuchMethodError:
>>
org.apache.solr.core.CoreContainer.(Ljava/lang/String;Ljava/io/File;)V
>> at
>>
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1482)
>> at
>>
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:521)
>> at
>>
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458)
>> at
>>
org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:295)
>> at
>>
org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223)
>> at
>>
org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292)
>> at
>>
org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194)
>> at
>>
org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:608)
>> at
>>
org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932)
>> at
>>
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479)
>> at
>>
org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:389)
>> at
>>
org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:294)
>> at
>>
org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112)
>> at
>>
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4887)
>> at
>>
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5381)
>> at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>> at
>>
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
>> at
>>
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at
>>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.lang.NoSuchMethodError:
>>
org.apache.solr.core.CoreContainer.(Ljava/lang/String;Ljava/io/File;)V
>> at
>>
org.springframework.data.solr.server.support.EmbeddedSolrServerFactory.createPathConfiguredSolrServer(EmbeddedSolrServerFactory.java:9


Re: Best practice to index and query SolrCloud

2013-09-24 Thread shamik
 Thanks for the insight Shawn, extremely helpful. 

Appreciate it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-to-index-and-query-SolrCloud-tp4091823p4091836.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best practice to index and query SolrCloud

2013-09-24 Thread shamik
Hi,  I'm new to SolrCloud , trying to set up a test environment based on the
wiki documentation. Based on the example, the setup and sample indexing /
query works fine. But I need some pointers on the best practices of indexing
/ querying in SolrCloud. For e.g. I've 2 shards, with 1 leader and a
corresponding replica each. Lets say, each of them are running on their
dedicated servers.Now, I'm using SolrJ client (CloudSolrServer) to send
documents for indexing. Based on SolrCloud fundamental, I can send the
document to any of the four servers or to a specific shard id. Is it
advisable to use the server information directly into the client ? In case
the specific node goes down, then indexing will fail. Is it recommended to
have a load balancer (Haproxy , ELB in Amazon) for the indexing purpose
?Same applies during query time. I know we can add a query parameter and
include all four server information. But then any change in the server
configuration will have an impact.Any help will be
appreciated.Regards,Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-to-index-and-query-SolrCloud-tp4091817.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best practice to index and query SolrCloud

2013-09-24 Thread Shawn Heisey

On 9/24/2013 2:46 PM, Shamik Bandopadhyay wrote:

Now, I'm using SolrJ client (CloudSolrServer) to send documents for
indexing. Based on SolrCloud fundamentals, I can send the document to any
of the four servers or to a specific shard id. Is it advisable to use the
server information directly into the client ? In case the specific node
goes down, then indexing will fail. Is it recommended to have a load
balancer (Haproxy , ELB in Amazon) for the indexing purpose ?


CloudSolrServer contains a zookeeper client.  When you create an 
instance, you don't give it the URL for Solr, you tell it about your 
zookeeper ensemble, using the same zkHost info you give to Solr itself. 
 It is always aware of the clusterstate and uses that information to 
decide where the actual Solr requests go.


When SolrJ 4.5 comes out (which is going to be very soon), it will know 
how to route updates to the correct shard leader, so indexing will be 
even more efficient.


You will only need a load balancer if you use Solr URLs directly or use 
a programming API that is unaware of zookeeper.



Same applies during query time. I know we can add a query parameter and
include all four server information. But then any change in the server
configuration will have an impact. Any help will be appreciated.


What I said above for indexing applies equally to queries. 
CloudSolrServer will load balance queries across all operational servers 
automatically.


Thanks,
Shawn



Re: Some text not indexed in solr4.4

2013-09-24 Thread Utkarsh Sengar
WordDelimiterFilterFactory was the culprit. Removing that fixed the problem.


Thanks,
-Utkarsh


On Tue, Sep 24, 2013 at 12:17 PM, Utkarsh Sengar wrote:

> @Furkan Yes, I have run a commit, other text is searchable.
> Not sure what you mean there for MultiPhraseQuery. It is mentioned in
> context to SynonymFilterFactory, RemoveDuplicatesTokenFilterFactory and
> PositionFilterFactory. Which part are you referring to?
>
> @Jason I get this response (I have multi-core setup) by hitting this URL:
> http://SOLR_SERVER/solr/prodinfo/terms?terms.fl=text&terms.prefix=dc
>
> 0 name="QTime">0 name="text"/>
>
> Not sure how can I infer this response. I get the same response for any
> prefix like: a, b, iph etc.
>
> My guess is this is happening due to WordDelimiterFilterFactory here:
> https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L16, what do
> you think? dc44 is somehow delimited during the query time?
> Example here says:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
> -> Split on letter-number transitions (can be turned off - see
> splitOnNumerics parameter) "SD500" -> "SD", "500"
>
> I will test it out and update this thread with my findings.
>
> Thanks,
> -Utkarsh
>
>
>
> On Tue, Sep 17, 2013 at 5:10 PM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
>
>> Utkarsh,
>>
>> Check to see if the value is actually indexed into the field by using the
>> Terms request handler:
>>
>> http://localhost:8983/solr/terms?terms.fl=text&terms.prefix=d
>>
>> (adjust the prefix to whatever you're looking for)
>>
>> This should get you going in the right direction.
>>
>> Jason
>>
>>
>> On Sep 17, 2013, at 2:20 PM, Utkarsh Sengar 
>> wrote:
>>
>> > I have a copyField called allText with type text_general:
>> > https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68
>> >
>> > I have ~100 documents which have the text: dyson and dc44 or dc41 etc.
>> >
>> > For example:
>> > "title": "Dyson DC44 Animal Digital Slim Cordless Vacuum"
>> > "description": "The DC44 Animal is the new Dyson Digital Slim vacuum
>> > cleaner  the cordless machine that doesn’t lose suction. It has been
>> > engineered for floor to ceiling cleaning. DC44 Animal has a detachable
>> > long-reach wand  which is balanced for floor to ceiling cleaning.   The
>> > motorized floor tool has twice the power of the DC35 floor tool  to
>> drive
>> > the bristles deeper into the carpet pile with more force. It attaches to
>> > the wand or directly to the machine for cleaning awkward spaces. The
>> brush
>> > bar has carbon fiber filaments for removing fine dust from hard floors.
>> > DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
>> > Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
>> > manganese cobalt battery and Root Cyclone technology for constant
>>  powerful
>> > suction.",
>> > UPC: 0879957006362
>> >
>> > The documents are indexed.
>> >
>> > Analysis says its indexeD: http://i.imgur.com/O52ino1.png
>> > But when I search for allText:"dyson dc44" I get no results, response:
>> > http://pastie.org/8334220
>> >
>> > Any suggestions about the problem? I am out of ideas about how to debug
>> > this.
>> >
>> > --
>> > Thanks,
>> > -Utkarsh
>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh


Best practice to index and query SolrCloud

2013-09-24 Thread Shamik Bandopadhyay
Hi,

  I'm new to SolrCloud , trying to set up a test environment based on
the wiki documentation. Based on the example, the setup and sample indexing
/ query works fine. But I need some pointers on the best practices of
indexing / querying in SolrCloud. For e.g. I've 2 shards, with 1 leader and
a corresponding replica each. Lets say, each of them are running on their
dedicated servers.

Now, I'm using SolrJ client (CloudSolrServer) to send documents for
indexing. Based on SolrCloud fundamentals, I can send the document to any
of the four servers or to a specific shard id. Is it advisable to use the
server information directly into the client ? In case the specific node
goes down, then indexing will fail. Is it recommended to have a load
balancer (Haproxy , ELB in Amazon) for the indexing purpose ?

Same applies during query time. I know we can add a query parameter and
include all four server information. But then any change in the server
configuration will have an impact. Any help will be appreciated.

Thanks,

Shamik


Re: Soft commit and flush

2013-09-24 Thread Shawn Heisey

On 9/24/2013 5:51 AM, adfel70 wrote:

My conclusion is that soft commit always flushes the data, but because of
the implementation of NRTCachingDirectoryFactory, the data will be written
to the disk when its getting too big.


The NRTCachingDirectoryFactory (which creates NRTCachingDirectory 
instances) used by default in newer Solr versions has default settings 
for some of its parameters that show up in the solr log:


maxCacheMB=48.0 maxMergeSizeMB=4.0

The constructor javadocs for NRTCachingDirectory show what circumstances 
will cause the directory to use RAM instead of flushing to disk:


http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html#NRTCachingDirectory%28org.apache.lucene.store.Directory,%20double,%20double%29

"We will cache a newly created output if 1) it's a flush or a merge and 
the estimated size of the merged segment is <= maxMergeSizeMB, and 2) 
the total cached bytes is <= maxCachedMB"


Thanks,
Shawn



Re: Some text not indexed in solr4.4

2013-09-24 Thread Utkarsh Sengar
@Furkan Yes, I have run a commit, other text is searchable.
Not sure what you mean there for MultiPhraseQuery. It is mentioned in
context to SynonymFilterFactory, RemoveDuplicatesTokenFilterFactory and
PositionFilterFactory. Which part are you referring to?

@Jason I get this response (I have multi-core setup) by hitting this URL:
http://SOLR_SERVER/solr/prodinfo/terms?terms.fl=text&terms.prefix=dc

00

Not sure how can I infer this response. I get the same response for any
prefix like: a, b, iph etc.

My guess is this is happening due to WordDelimiterFilterFactory here:
https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L16, what do
you think? dc44 is somehow delimited during the query time?
Example here says:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
-> Split on letter-number transitions (can be turned off - see
splitOnNumerics parameter) "SD500" -> "SD", "500"

I will test it out and update this thread with my findings.

Thanks,
-Utkarsh



On Tue, Sep 17, 2013 at 5:10 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Utkarsh,
>
> Check to see if the value is actually indexed into the field by using the
> Terms request handler:
>
> http://localhost:8983/solr/terms?terms.fl=text&terms.prefix=d
>
> (adjust the prefix to whatever you're looking for)
>
> This should get you going in the right direction.
>
> Jason
>
>
> On Sep 17, 2013, at 2:20 PM, Utkarsh Sengar  wrote:
>
> > I have a copyField called allText with type text_general:
> > https://gist.github.com/utkarsh2012/6167128#file-schema-xml-L68
> >
> > I have ~100 documents which have the text: dyson and dc44 or dc41 etc.
> >
> > For example:
> > "title": "Dyson DC44 Animal Digital Slim Cordless Vacuum"
> > "description": "The DC44 Animal is the new Dyson Digital Slim vacuum
> > cleaner  the cordless machine that doesn’t lose suction. It has been
> > engineered for floor to ceiling cleaning. DC44 Animal has a detachable
> > long-reach wand  which is balanced for floor to ceiling cleaning.   The
> > motorized floor tool has twice the power of the DC35 floor tool  to drive
> > the bristles deeper into the carpet pile with more force. It attaches to
> > the wand or directly to the machine for cleaning awkward spaces. The
> brush
> > bar has carbon fiber filaments for removing fine dust from hard floors.
> > DC44 Animal has a run time of 20 minutes or 8 minutes on Boost mode.
> > Powered by the Dyson digital motor  DC44 Animal has a fade-free nickel
> > manganese cobalt battery and Root Cyclone technology for constant
>  powerful
> > suction.",
> > UPC: 0879957006362
> >
> > The documents are indexed.
> >
> > Analysis says its indexeD: http://i.imgur.com/O52ino1.png
> > But when I search for allText:"dyson dc44" I get no results, response:
> > http://pastie.org/8334220
> >
> > Any suggestions about the problem? I am out of ideas about how to debug
> > this.
> >
> > --
> > Thanks,
> > -Utkarsh
>
>


-- 
Thanks,
-Utkarsh


ANNOUNCE: Lucene/Solr Revolution EU 2013 - Session List & Early Bird Pricing

2013-09-24 Thread Chris Hostetter


(NOTE: cross-posted to various lists, please reply only to general@lucene 
w/ any questions or follow ups)


Hey folks,

2 announcements regarding the upcoming Lucene/Solr Revolution EU 2013 in 
Dublin (November 4-7)...


## 1) Session List Now Posted

I'd like to thank everyone who helped vote for the sessions that 
interested you during the community voting period.  The bulk of the 
sessions that were selected, and will be presented, are now listed online 
-- a few more will be added once we get final confirmation from the 
remaining speakers who were selected...


  http://lucenerevolution.org/sessions

## 2) Early Bird Pricing Ends Soon

"Early bird" discount registration pricing is available until Monday, 
September 30th -- after that, the registration cost will increase by $100 
USD.  So if you are planning to go, you should register soon and save some 
money...


  http://lucenerevolution.org/registration


Additional details about the conference can be found at the website, or 
feel free to reply to this email with any questions...


  http://lucenerevolution.org


-Hoss


dih HTMLStripTransformer

2013-09-24 Thread Andreas Owen
why does stripHTML="false" have no effect in dih? the html is strippedin text 
and text_nohtml when i do display the index with select?q=*

i'm trying to get a field without html and one with it so i can also index the 
links on the page.

data-config.xml
 















Re: Get only those documents that are fully satisfied.

2013-09-24 Thread asuka
Thanks Chris,
that's exactly what I was looking for.

One last question. As far as I can see, the solution that you are offering
me, termfreq is for Solr 4+, isn't it?

Right now I'm working with Solr 3.6.2. Is there any solution for such
version or do I need an upgrade?

Kind Regards



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091807.html
Sent from the Solr - User mailing list archive at Nabble.com.


Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-24 Thread JMill
Hi,

I'm using Solr's Suggester function to implement an autocomplete feature.
I have it setup to check against the "username" and "name" fields.  Problem
is when running  a query against the name, the second term, after
whitespace (surename) returns 0 results.  Works if if query is a partial
name starting from the begining e.g. Given the name "Bill Rogers", a query
for Rogers will return 0 results whereas a query for "Bill" will return
positive (Bill Rogers). As for the username, it's not working at.

I am after the following behaviour.

Match any partial words in the fields "username" or "name" and return the
results.  If there is match in the field "name" the return the whole name
e.g. given the queries "Rogers" or "Bill"" return "Bill Rogers (not the
single word that was a match)".

schema.xml extract
..

 

...


...


 
   
   
   
 



solrconfig.xml



   suggest
   org.apache.solr.spelling.suggest.Suggester
   org.apache.solr.spelling.suggest.tst.TSTLookup
   autocomplete  
   0.005
   true
   




..

  
true
suggest
true
5
true
  
  
 spellcheck
  



Re: How can i search maximum number of word in particular docs

2013-09-24 Thread Chris Hostetter
: &q=pookan
...
: Acutually i want particular word for that match max in content tag that
: come first (relevancy based)

the default TF-IDF scoring mechanism rewards documents for matching a term 
multiple times (thats the the "TF" part) but there is also a length 
normalization factor that comes into play -- the idea being that a a very 
short document -- maybe only a paragraph -- containing 2 instances of 
"pookan" is probably more relevant than a longer document containing 3 
instances of the same term, if that longer document is several thouand 
paragraphs.

you can disable this length normalization by using "omitNorms"...

https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties


-Hoss


SOLR grouped query sorting on numFound

2013-09-24 Thread Brent Ryan
We ran into 1 snag during development with SOLR and I thought I'd run it by
anyone to see if they had any slick ways to solve this issue.

Basically, we're performing a SOLR query with grouping and want to be able
to sort by the number of documents found within each group.

Our query response from SOLR looks something like this:

{

  "responseHeader":{

"status":0,

"QTime":17,

"params":{

  "indent":"true",

  "q":"*:*",

  "group.limit":"0",

  "group.field":"rfp_stub",

  "group":"true",

  "wt":"json",

  "rows":"1000"}},

  "grouped":{

"rfp_stub":{

  "matches":18470,

  "groups":[{


"groupValue":"java.util.UUID:a1871c9e-cd7f-4e87-971d-d8a44effc33e",

  "doclist":{"*numFound*":3,"start":0,"docs":[]

  }},

{


"groupValue":"java.util.UUID:0c2f1045-a32d-4a4d-9143-e09db45a20ce",

  "doclist":{"*numFound*":5,"start":0,"docs":[]

  }},

{


"groupValue":"java.util.UUID:a3e1d56b-4172-4594-87c2-8895c5e5f131",

  "doclist":{"*numFound*":6,"start":0,"docs":[]

  }},

…


The *numFound* shows the number of documents within that group.  Is there
anyway to perform a sort on *numFound* in SOLR ?  I don't believe this is
supported, but wondered if anyone their has come across this and if there
was any suggested workarounds given that the dataset is really too large to
hold in memory on our app servers?


Re: Get only those documents that are fully satisfied.

2013-09-24 Thread Chris Hostetter

: Your requirement is still somewhat ambiguous - you use "fully" and "some" in
: the same sentence. Which is it?

the request seems pretty clear to me...

:   I don't want to get documents that fit my whole query, I want those
: documents that are fully satisfied  with some terms of the query.

...my reading is:

 * given a set of documents each containing an arbitrary number of 
"doc_terms" in "field_f"
 * given a query "q" containing an arbitrary number of "q_terms"
 * find all documents where every "doc_term" in that document's "field_f" 
exists in the query as a "q_term"

ie: all terms of the document must exist in the query for the doc to 
match, but not all terms from the query must exist in a document.

There is no trivial out of the box solution at the moment, but there is a 
solution possible using function queries as described in 
this email...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201308.mbox/%3Calpine.DEB.2.02.1308091122150.2685@frisbee%3E

Repeating the key bits below...

-Hoss


...

1) if you don't care about using non-trivial analysis (ie: you don't need 
stemming, or synonyms, etc..), you can do this with some really simple 
function queries -- asusming you index a field containing hte number of 
"words" in each document, in addition to the words themselves.  Assuming 
your words are in a field named "words" and the number of words is in a 
field named "words_count" a request for something like "Galaxy Samsung S4" 
can be represented as...

  q={!frange l=0 u=0}sub(words_count,
 sum(termfreq('words','Galaxy'),
 termfreq('words','Samsung'),
 termfreq('words','S4'))

...ie: you want to compute the sub of the term frequencies for each of 
hte words requested, and then you want ot subtract that sum from the 
number of terms in the documengt -- and then you only want ot match 
documents where the result of that subtraction is 0.

one complexity that comes up, is that you haven't specified:
  
  * can the list of words in your documents contain duplicates?
  * can the list of words in your query contain duplicates?
  * should a document with duplicatewords match only if the query also 
contains the same word duplicated?

...the answers to those questions make hte math more complicated (and are 
left as an excersize for the reader)


2) if you *do* care about using non-trivial analysis, then you can't use 
the simple "termfreq()" function, which deals with raw terms -- in stead 
you have to use the "query()" function to ensure that the input is parsed 
appropriately -- but then you have to wrap that function in something that 
will normalize the scores - so in place of termfreq('words','Galaxy') 
you'd want something like...

if(query({!field f=words v='Galaxy'}),1,0)

...but again the math gets much harder if you make things more complex 
with duplicate words i nthe document or duplicate words in the query -- 
you'd probably have to use a custom similarity to get the scores returned 
by the query() function to be usable as is in the match equation (and drop 
the "if()" function)


Re: Get only those documents that are fully satisfied.

2013-09-24 Thread Jack Krupansky
Your requirement is still somewhat ambiguous - you use "fully" and "some" in 
the same sentence. Which is it?


If you simply want documents that contain every one of the query terms, 
using the explicit AND operator ("+" or "AND") or set the implicit operator 
to "AND".


But... we are still in the dark as to your precise requirement.

-- Jack Krupansky

-Original Message- 
From: asuka

Sent: Tuesday, September 24, 2013 11:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Get only those documents that are fully satisfied.

Hi Andre,
  I don't want to get documents that fit my whole query, I want those
documents that are fully satisfied  with some terms of the query.

In other words, I'm interested in an exact match from the point of view of
the document, not from the point of view of the query.

Asuka



Andre Bois-Crettez wrote

(Your schema and query only appear on the nabble.com forum, it is mostly
empty for me on the mailing list)

What you want is probable to change OR to AND :

params.set("q.op", "AND");






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091775.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Excluding a facet's constraint to exclude a facet

2013-09-24 Thread Chris Hostetter

: documentation that I can limit results to category "A" as follows:
: 
: fq={!raw f=foo}A
: 
: But I cannot seem to (Solr 3.6.1) exclude that way:
: 
: fq={!raw f=foo}-A

with the raw" qparser, there is no markup syntax at all -- so it's 
interpreting the "-" as part of the literal term value you are trying to 
query for.

: And the simpler test (with edismax) doesn't work either:
: 
: fq=foo:A# works
: fq=foo:-A   # doesn't work

likewise: in the lucene/dismax/edismax parsers operatores (like "-" and 
"+") need to come before the field you are querying on...

   fq=-foo:A
or fq={!edismax}-foo:A


If you upgrade to a more current 4.x version of Solr, then the 
default (lucene) parser in solr has been updated to recognize 
nested parser syntax (ie: "{!parser}input") as inline clauses, so you can 
use something like this...

   fq=-{!raw f=foo}A

...which results in the (default) lucene parser recognizing the "-" 
operator should be applied to a nested clause which is generated by asking 
the "raw" parser to use the local param "f=foo" when parsing the input "A"

One thing to watch out for however is whitespace -- if you have a query 
like this...

   fq=-{!raw f=foo}AAA BBB

...then the (default) lucene parser gets 2 clauses: a negated clause 
resulting form asking the "raw" parser to parse "AAA" and a positive 
clause that hte lucene parser parses itself, using the default search 
field to look for "BBB".

You would nee to use the "v" local param to ensure that the entire string 
gets parsed by the raw parser, either...

   fq=-{!raw f=foo v='AAA BBB'}
or fq=-{!raw f=foo v=$my_foo_fq}&my_foo_fq=AAA BBB




-Hoss


Excluding a facet's constraint to exclude a facet

2013-09-24 Thread Dan Davis
Summary - when constraining a search using filter query, how can I exclude
the constraint for a particular facet?

Detail - Suppose I have the following facet results for a query "q=*
mainquery*":





491
111
103
...

...

I understand from
http://people.apache.org/~hossman/apachecon2010/facets/and Wiki
documentation that I can limit results to category "A" as follows:

fq={!raw f=foo}A

But I cannot seem to (Solr 3.6.1) exclude that way:

fq={!raw f=foo}-A

And the simpler test (with edismax) doesn't work either:

fq=foo:A# works
fq=foo:-A   # doesn't work

Do I need to be using facet.method=enum to get this to work?   What else
could be the problem here?


Re: How to sort over all documents by score after Result Grouping / Field Collapsing

2013-09-24 Thread go2jun
Thanks Erick for your response.

My goal is
1. try to search from solr. In the search result, we would like show no more
than two results from the same source id.
2. For the search results, we would like all these results sorted by their
score.

So If I use solr result grouping to get the top two result from each group,
then I need to un-group them.

So my question is there any pure solr solution to handle this? I prefer it
handle by solr other than my application, because the search result are very
large.

Thanks!
Jun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-over-all-documents-by-score-after-Result-Grouping-Field-Collapsing-tp4091593p4091784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can i search maximum number of word in particular docs

2013-09-24 Thread Upayavira
Are you saying that the more times the word appears, the more you want
it to score?

Note, add debugQuery=true to your query, and look near the end of the
output, you will be able to see exactly how the score was calculated and
thus which component wasn't behaving as you expected (you might want to
review this info alongside a Solr book or two, it is quite complex).

Looking at your example below, I suspect that all of your examples have
the same score, so are sorted randomly.

Upayavira

On Tue, Sep 24, 2013, at 01:38 PM, Viresh Modi wrote:
> Mu Query Looks Like:
> 
> start=0&rows=10&hl=true&hl.fl=content&qt=dismax
> &q=pookan
> &fl=id,application,timestamp,name,score,metaData,metaDataDate
> &fq=application:OnlineR3_6_4
> &fq=(metaData:channelId/101 OR metaData:channelId/104)
> &sort=score desc
> 
> 
> but not getting result as per desired
> 
>  OnlineR3_6_4_101_7
>  pookan pookan pookan
> 
> 
> OnlineR3_6_4_101_20
>  pookan pookan pookan pookan pookan
> 
> 
>  OnlineR3_6_4_101_19
>  pookan pookan pookan pookan
> 
> 
>   OnlineR3_6_4_101_21
>  pookan pookan
> 
> 
> 
> Acutually i want particular word for that match max in content tag that
> come first (relevancy based)


Re: DIH field defaults or re-assigning field values

2013-09-24 Thread P Williams
I discovered how to use the
ScriptTransformer
which
worked to solve my problem.  I had to make use
of context.setSessionAttribute(...,...,'global') to store a flag for the
value in the file because the script is only called if there are rows to
transform and I needed to know when the default was appropriate to set in
the root entity.

Thanks for your suggestions Alex.

Cheers,
Tricia


On Wed, Sep 18, 2013 at 1:19 PM, P Williams
wrote:

> Hi All,
>
> I'm using the DataImportHandler to import documents to my index.  I assign
> one of my document's fields by using a sub-entity from the root to look for
> a value in a file.  I've got this part working.  If the value isn't in the
> file or the file doesn't exist I'd like the field to be assigned a default
> value.  Is there a way to do this?
>
> I think I'm looking for a way to re-assign the value of a field.  If this
> is possible then I can assign the default value in the root entity and
> overwrite it if the value is found in the sub-entity. Ideas?
>
> Thanks,
> Tricia
>


Re: Interesting edismax/qs bug in Solr 3.5

2013-09-24 Thread Arcadius Ahouansou
Thanks Michael.

Arcadius.


On 23 September 2013 05:32, Michael Ryan  wrote:

> Sounds like https://issues.apache.org/jira/browse/LUCENE-3821 (issue
> seems to be fixed but still shows as open).
>
> -Michael
>
> -Original Message-
> From: Arcadius Ahouansou [mailto:arcad...@menelic.com]
> Sent: Sunday, September 22, 2013 11:15 PM
> To: solr-user
> Subject: Interesting edismax/qs bug in Solr 3.5
>
> We have been seeing a strange bug in our prod Solr 3.5.
>
> I went to download a fresh copy of Solr3.5, with default schema  and
> indexed (curl or post.jar) the following 2 docs
>
> [
>{
>   "id":"1",
>   "title":"One Earth"
>},
>{
>   "id":"2",
>   "title":"One Love One Earth"
>}
>
> ]
>
>
> I could browse and see the docs in solr.
>
> However, when I do:
> /solr/select?q="One Love One Earth"&qf=title&qs=2&defType=edismax&pf=title
>
> I get nothing back.
> when I change qs=4 in the query, then I see the expected doc2.
> debugQuery=true does not reveal anything.
>
> - I have noticed that when I reverse the order of the documents in the
> input file i.e doc2 first, then doc1 , and do the  indexing (using curl or
> post.jar), the the query above works and return doc2 as expected.
> - Same when I index only doc2 (doc1 not indexed).
>
> I tested solr3.6.2  and 4.4.0 and I can confirm they are not affected by
> this issue.
>
> I looked at the change logs for 3.6.2 and jira but could not find any
> trace of this problem.
>
> Any pointer to the ticket that addressed this issue will be appreciated.
>
>
> Thank you very much.
>
>
> Arcadius.
> .
>


Re: Get only those documents that are fully satisfied.

2013-09-24 Thread asuka
Hi Andre,
   I don't want to get documents that fit my whole query, I want those
documents that are fully satisfied  with some terms of the query.

In other words, I'm interested in an exact match from the point of view of
the document, not from the point of view of the query.

Asuka



Andre Bois-Crettez wrote
> (Your schema and query only appear on the nabble.com forum, it is mostly
> empty for me on the mailing list)
> 
> What you want is probable to change OR to AND :
> 
> params.set("q.op", "AND");





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Walter Underwood
Did all of the curl update commands return success? Ane errors in the logs?

wunder

On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:

> Is it possible that some of those 80K docs were simply not valid? e.g.
> had a wrong field, had a missing required field, anything like that?
> What happens if you clear this collection and just re-run the same
> indexing process and do everything else the same?  Still some docs
> missing?  Same number?
> 
> And what if you take 1 document that you know is valid and index it
> 80K times, with a different ID, of course?  Do you see 80K docs in the
> end?
> 
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
> 
> 
> 
> On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena  wrote:
>> Doc count did not change after I restarted the nodes. I am doing a single
>> commit after all 80k docs. Using Solr 4.4.
>> 
>> Regards,
>> Saurabh
>> 
>> 
>> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>> 
>>> Interesting. Did the doc count change after you started the nodes again?
>>> Can you tell us about commits?
>>> Which version? 4.5 will be out soon.
>>> 
>>> Otis
>>> Solr & ElasticSearch Support
>>> http://sematext.com/
>>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena"  wrote:
>>> 
 Hello,
 
 I am testing High Availability feature of SolrCloud. I am using the
 following setup
 
 - 8 linux hosts
 - 8 Shards
 - 1 leader, 1 replica / host
 - Using Curl for update operation
 
 I tried to index 80K documents on replicas (10K/replica in parallel).
 During indexing process, I stopped 4 Leader nodes. Once indexing is done,
 out of 80K docs only 79808 docs are indexed.
 
 Is this an expected behaviour ? In my opinion replica should take care of
 indexing if leader is down.
 
 If this is an expected behaviour, any steps that can be taken from the
 client side to avoid such a situation.
 
 Regards,
 Saurabh Saxena
 
>>> 

--
Walter Underwood
wun...@wunderwood.org





Re: Select all descendants in a relation index

2013-09-24 Thread Oussama Mubarak

Thank you Erick.

I actually do need it to extend to grandchildren as stated in "I need to 
be able to find *all descendants* of a node with one query".
I already have an index that allows me to find the direct children of a 
node, what I need is to be able to get all descendants of a node 
(children, grandchildren... etc).


I have submitted this questions on stackoverflow where I put in more 
details : 
http://stackoverflow.com/questions/18984183/join-query-in-apache-solr-how-to-get-all-levels-in-hierarchical-data


Semiaddict


Le 24/09/2013 16:08, Erick Erickson a écrit :

Sure, index the parent node id (perhaps multiple) with each child
and add &fq=parent_id:12.

you can do the reverse and index each node with it's child node IDs
to to ask the inverse question.

This won't extend to grandchildren/parents, but you haven't stated that you
need to do this.

Best,
Erick

On Mon, Sep 23, 2013 at 6:23 PM, Semiaddict  wrote:

Hello,

I am using Solr to index Drupal node relations (over 300k relations on over 
500k nodes), where each relation consists of the following fields:
- id : the id of the relation
- source_id : the source (parent) node id
- targe_id : the targe (child) node id

I need to be able to find all descendants of a node with one query.
So far I've managed to get direct children using the join syntax of Solr4 such 
as (http://wiki.apache.org/solr/Join):
/solr/collection/select?q={!join from=source_id to=target_id}source_id:12

Note that each node can have multiple parents and multiple children.

Is there a way to get all descendants of node 12 without having to create a 
loop in PHP to find all children, then all children of each child, etc ?
If not, is it possible to create a recursive query directly in Solr, or is 
there a better way to index tree structures ?

Any help or suggestion would be highly appreciated.

Thank you in advance,

Semiaddict




Re: [DIH] Logging skipped documents

2013-09-24 Thread Stefan Matheis
Jérôme

Just had a quick look at the source of 
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/XPathEntityProcessor.java?view=markup#l324
 .. which looks like there is LOG.warn(msg, e); Statement on Line 331 where msg 
should include the url for the tried document?

Otherwise, if that's not the place where the exception happens .. you might be 
able to add LOG Statements all by yourself and compile SOLR from Source (again) 
to make that work?

-Stefan  


On Monday, September 23, 2013 at 2:32 PM, jerome.dup...@bnf.fr wrote:

>  
> Hello,
>  
> I have a question, I index documents and a small part them are skipped, (I
> am in onError="skip" mode)
> I'm trying to get a list of them, in order to analyse what's worng with
> these documents
> Is there a mean to get the list of skipped documents, and some more
> information (my onError="skip" is in an XPathEntityProcessor, the name of
> the file processed would be OK)
>  
>  
> Cordialement,
> ---
> Jérôme Dupont
> Bibliothèque Nationale de France
> Département des Systèmes d'Information
> Tour T3 - Quai François Mauriac
> 75706 Paris Cedex 13
> téléphone: 33 (0)1 53 79 45 40
> e-mail: jerome.dup...@bnf.fr (mailto:jerome.dup...@bnf.fr)
> ---
>  
>  
>  
> Participez à la Grande Collecte 1914-1918 Avant d'imprimer, pensez à 
> l'environnement.  



Re: Solr query processing

2013-09-24 Thread Erick Erickson
bq: As an aside, it would be nice if the queryparser could do the same
thing in Lucene

Lucene does not and (probably) will not ever know anything about the
schema. It's
purposely unaware of this higher-level construct. I wish you great good luck
persuading the lucene guys to have anything like a schema, you'll need it ;).

Best,
Erick



On Mon, Sep 23, 2013 at 9:44 PM, Otis Gospodnetic
 wrote:
> That's right.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Sep 23, 2013 12:55 PM, "Scott Smith"  wrote:
>
>> I just want to state a couple of things and hear someone say, "that's
>> right".
>>
>>
>> 1.   In a solr query you can have multiple fq's, but only a single q.
>>  And yes, I can simply AND the multiple "q"s together.  Just want to avoid
>> that if I'm wrong.
>>
>> 2.   A subtler issue is that when a full query is executied, Solr must
>> look at the schema to see how each field was tokenized (or not) and the
>> various other filters applied to a field so that it can properly transform
>> fields data (e.g., tokenize the text, but not keywords).  As an aside, it
>> would be nice if the queryparser could do the same thing in Lucene (I know,
>> wrong forum :)).
>> Scott
>>


Re: Select all descendants in a relation index

2013-09-24 Thread Erick Erickson
Sure, index the parent node id (perhaps multiple) with each child
and add &fq=parent_id:12.

you can do the reverse and index each node with it's child node IDs
to to ask the inverse question.

This won't extend to grandchildren/parents, but you haven't stated that you
need to do this.

Best,
Erick

On Mon, Sep 23, 2013 at 6:23 PM, Semiaddict  wrote:
> Hello,
>
> I am using Solr to index Drupal node relations (over 300k relations on over 
> 500k nodes), where each relation consists of the following fields:
> - id : the id of the relation
> - source_id : the source (parent) node id
> - targe_id : the targe (child) node id
>
> I need to be able to find all descendants of a node with one query.
> So far I've managed to get direct children using the join syntax of Solr4 
> such as (http://wiki.apache.org/solr/Join):
> /solr/collection/select?q={!join from=source_id to=target_id}source_id:12
>
> Note that each node can have multiple parents and multiple children.
>
> Is there a way to get all descendants of node 12 without having to create a 
> loop in PHP to find all children, then all children of each child, etc ?
> If not, is it possible to create a recursive query directly in Solr, or is 
> there a better way to index tree structures ?
>
> Any help or suggestion would be highly appreciated.
>
> Thank you in advance,
>
> Semiaddict


Re: Indexing bulk loads of PDF files and extracting information from them

2013-09-24 Thread Erick Erickson
Consider using a SolrJ program, perhaps multiple
ones running in parallel.

See: http://searchhub.org/dev/2012/02/14/indexing-with-solrj/

Best,
Erick

On Mon, Sep 23, 2013 at 3:31 PM, Sadika Amreen  wrote:
> Hi all,
>
>
>
> I am looking to index the entire directory of PDF files. We have a very large 
> volume of PDFs (3000+, possibly much more), so adding them manually would be 
> cumbersome.
>
>
>
> I have seen more than a couple of dozen links explaining how to index PDF 
> using SOLR, but none were details enough to help me get started.
>
> I understand that indexing a word or PDF document requires the use of the 
> ExtractingRequestHandler which uses Apache Tika.
>
>
>
> My question is:  How do I configure the Handler so that it can extract the 
> required information from bulk loads of PDF?
>
> I know I am asking a broad question, but I am struggling to find a good 
> guidance and something that would give me a step to step approach.
>
>
>
> There is an example configuration in the following link: 
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> I have also seen these threads:
>
> http://stackoverflow.com/questions/5947157/index-search-pdf-content-with-solr
>
> http://www.gossamer-threads.com/lists/lucene/general/158117
>
>
>
> I am still trying to understand the configuration process, so any concrete 
> help would be welcome.
>
>
>
> Thanks,
>
> Sadika Amreen
>
> Data Scientist
>
> PYA Analytics
>
>
>
> DISCLOSURE
>
>
>
> Any U.S. tax advice contained in the body of this email was not intended or 
> written to be used, and cannot be used, by the recipient for the purpose of 
> avoiding penalties that may be imposed under the Internal Revenue Code or 
> applicable state or local tax provisions.
>
>
>
> IMPORTANT NOTICE
>
>
>
> This E-mail (including any attachments) contains PRIVILEGED AND CONFIDENTIAL 
> INFORMATION protected by Federal and/or State law and is intended only for 
> the use of the individual(s) or entity(ies) designated as recipient(s). If 
> you are not an intended recipient of the E-mail, you are hereby notified that 
> any disclosure, copying, distribution, or action taken in reliance on the 
> contents of this E-mail is strictly prohibited. Disclosure to anyone other 
> than the intended recipient does not constitute a waiver of any applicable 
> privilege.
>
> If you have received this E-mail in error, please immediately notify us by 
> phone at (800) 270-9629 or reply to the sender of this email and then 
> permanently delete the original and any copy of this E-mail (including any 
> attachments) and destroy any printout thereof.


Re: How to sort over all documents by score after Result Grouping / Field Collapsing

2013-09-24 Thread Erick Erickson
It's not clear what you're trying to do. Do you want to un-group the results?
By that I mean are you trying to take the grouped results you get back and
display them in one flat list ordered by score?

If that's the case, the simplest thing to do would be to do this on
the application
side with the results, it should be quite straight-forward.

And you have not stated the user-level problem you're trying to solve, this
may be an XY problem.

Best,
Erick

On Mon, Sep 23, 2013 at 1:49 PM, go2jun  wrote:
> Hi, I have solr documents like this:
>
>   indexed="true"/>
>   indexed="true"/>
>   indexed="true"/>
>
> I know I can use solr Result Grouping / Field Collapsing to get the top 2
> result by grouping by source_id. Within each groups, documents sorted by
> scroe by query like this:
> http://localhost:8983/solr/select?q=bank&group.field=source_id&group=true&group.limit=2&group.main=true&sort=score
>
> My question is:
>
> 1. Is it possible to sort overall documents after I do above grouping?
> 2. Is there any other ways to implement above functions(by using solr
> functions directly)?
> 3. Is it possible to implement this by writing java code something like
> customized request handler to do this?
>
> Thanks in advance,
>
> Jun
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-sort-over-all-documents-by-score-after-Result-Grouping-Field-Collapsing-tp4091593.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: explicite deltaimports by given ids

2013-09-24 Thread Stefan Matheis
Peter

You can access request params that way: ${dataimporter.request.command} (from 
https://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters) - 
although i'm not sure what happens if you provide the same param multiple times.

Perhaps i'd go with &oid=5,6 as url param and use ".. WHERE oid IN( 
${dataimporter.request.oid} ) .." in the query?

-Stefan  


On Friday, September 13, 2013 at 3:37 PM, Peter Schütt wrote:

> Hallo,
> I want to trigger a deltaimportquery by given IDs.
>  
> Example:
>  
> query="select oid, att1, att2 from my_table"
>  
> deltaImportQuery="select oid, att1, att2 from my_table  
> WHERE oid=${dih.delta.OID}"
>  
> deltaQuery="select OID from my_table WHERE
> TIME_STAMP > TO_DATE
> (${dih.last_index_time:VARCHAR}, '-MM-DD HH24:MI:SS')"
>  
> deletedPkQuery="select OID from my_table
> where TIME_STAMP > TO_DATE(${dih.last_index_time:VARCHAR}, '-MM-
> DD HH24:MI:SS')"
>  
>  
> Pseudo URL:  
>  
> http://solr-server/solr/mycore/dataimport/?command=deltaImportQuery&&oid=5
> &&oid=6
>  
> to trigger the update or insert of the datasets with OID in (5, 6).
>  
> What is the correct way?
>  
> Thanks for any hint.
>  
> Ciao
> Peter Schütt
>  
>  




Re: SolrCloud High Availability during indexing operation

2013-09-24 Thread Otis Gospodnetic
Is it possible that some of those 80K docs were simply not valid? e.g.
had a wrong field, had a missing required field, anything like that?
What happens if you clear this collection and just re-run the same
indexing process and do everything else the same?  Still some docs
missing?  Same number?

And what if you take 1 document that you know is valid and index it
80K times, with a different ID, of course?  Do you see 80K docs in the
end?

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena  wrote:
> Doc count did not change after I restarted the nodes. I am doing a single
> commit after all 80k docs. Using Solr 4.4.
>
> Regards,
> Saurabh
>
>
> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Interesting. Did the doc count change after you started the nodes again?
>> Can you tell us about commits?
>> Which version? 4.5 will be out soon.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena"  wrote:
>>
>> > Hello,
>> >
>> > I am testing High Availability feature of SolrCloud. I am using the
>> > following setup
>> >
>> > - 8 linux hosts
>> > - 8 Shards
>> > - 1 leader, 1 replica / host
>> > - Using Curl for update operation
>> >
>> > I tried to index 80K documents on replicas (10K/replica in parallel).
>> > During indexing process, I stopped 4 Leader nodes. Once indexing is done,
>> > out of 80K docs only 79808 docs are indexed.
>> >
>> > Is this an expected behaviour ? In my opinion replica should take care of
>> > indexing if leader is down.
>> >
>> > If this is an expected behaviour, any steps that can be taken from the
>> > client side to avoid such a situation.
>> >
>> > Regards,
>> > Saurabh Saxena
>> >
>>


Re: Soft commit and flush

2013-09-24 Thread Otis Gospodnetic
Hi,

I believe data is not fsynched to disk until a hard commit (and even
then disks can lie to you and tell you data is safe even though it's
still in disk cache waiting to really be written to the medium) ,
which is why you can lose it between hard commits.  Soft commits just
make newly added docs visible in search results.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Sep 24, 2013 at 7:51 AM, adfel70  wrote:
> I am struggling to get a deep understanding of soft commit.
> I have read  Erick's post
> 
> which helped me a lot with when and why we should call each type of commit.
> But still, I cant understand what exactly happens when we call soft commit:
> I mean, does the new data is flushed, fsynched, or hold in the RAM... ?
> I tried to test it myself and I got 2 different behaviours:
> a. If I just had 1 document that was added to the index, soft commit did not
> cause index files to change.
> b. If I had a big change (addition of about 100,000 docs, ~5MB tlog file),
> calling the soft commit DID change the index files - so I guess that soft
> commit caused fsynch.
>
> My conclusion is that soft commit always flushes the data, but because of
> the implementation of NRTCachingDirectoryFactory, the data will be written
> to the disk when its getting too big.
>
> Can some one please correct me?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Javascript StatelessScriptUpdateProcessor

2013-09-24 Thread Stefan Matheis
Luís would you mind sharing your findings for others / archive?

On Tuesday, September 10, 2013 at 6:49 PM, Luís Portela Afonso wrote:

> Solved
> On Sep 10, 2013, at 4:55 PM, Luís Portela Afonso  (mailto:meligalet...@gmail.com)> wrote:
>  
> > It's that possible to execute queries on a javascript script on 
> > StatelessScriptUpdateProcessor.
> > I'm processing data with a javascript i want to execute a query to the 
> > indexed data of solr.
> >  
> > I know that the javascript script, has an instance of SolrQueryRequest and 
> > SolrQueryResponse, but neither can be used. At least i'm not being able to 
> > use it.  
>  
>  
> Attachments:  
> - smime.p7s
>  




Search statistics in category scale

2013-09-24 Thread Marina
I need to implement further functionality picture of it is attached below.
  I have
already running application based o Solr search.
In a few words about it: drop down will contain similar search phrases
within concrete category and number of items found.
Does Solr provide to collect such data and somehow receive it?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-statistics-in-category-scale-tp4091734.html
Sent from the Solr - User mailing list archive at Nabble.com.


How can i search maximum number of word in particular docs

2013-09-24 Thread Viresh Modi
Mu Query Looks Like:

start=0&rows=10&hl=true&hl.fl=content&qt=dismax
&q=pookan
&fl=id,application,timestamp,name,score,metaData,metaDataDate
&fq=application:OnlineR3_6_4
&fq=(metaData:channelId/101 OR metaData:channelId/104)
&sort=score desc


but not getting result as per desired

 OnlineR3_6_4_101_7
 pookan pookan pookan


OnlineR3_6_4_101_20
 pookan pookan pookan pookan pookan


 OnlineR3_6_4_101_19
 pookan pookan pookan pookan


  OnlineR3_6_4_101_21
 pookan pookan



Acutually i want particular word for that match max in content tag that
come first (relevancy based)


RE: Solr DIH call a java class

2013-09-24 Thread Dyer, James
You probably want to write a custom Transformer.  See: 
http://wiki.apache.org/solr/DIHCustomTransformer

Or maybe a custom Evaluator.  See:  
http://wiki.apache.org/solr/DataImportHandler#Evaluators_-_Custom_formatting_in_queries_and_urls

Possibly one or more of the built-in Transformers will do the job.  See:  
http://wiki.apache.org/solr/DataImportHandler#Transformer

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Prasi S [mailto:prasi1...@gmail.com] 
Sent: Tuesday, September 24, 2013 5:26 AM
To: solr-user@lucene.apache.org
Subject: Solr DIH call a java class

Hi,
Can we call a java class inside a solr ddata-config.xml file similar to
calling a script function.

I have few manipulations to do before sending data via dataimporthandler.
For each row, can I pass that row to a java class in the same way we pass
it to a script function?


Thanks,
Prasi



RE: Using CachedSqlEntityProcessor with delta imports in DIH

2013-09-24 Thread Dyer, James
I think delta imports only work on the parent entity and cached child entities 
will load in full, even if you only need to look up a few rows for the delta.  
Others though might have a way to get this to work.

Here's two possible workarounds.

On the child entity, specify:  

When it is a full import, pass the parameter: cache.impl=SortedMapBackedCache . 
For delta imports, leave this blank.  This (I think) will give you a cache for 
the full-import and no cache for the deltas.

Another workaround is to include a subquery on your delta import like this:
Select * from table ${delta.subquery}
When it is a delta import, pass the pass the paremeter: delta.subquery=where 
blah in (select blah from parent_table ...)

This will cause it to cache only the entries needed for that delta import.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: david.r.laroche...@gmail.com [mailto:david.r.laroche...@gmail.com] On 
Behalf Of David Larochelle
Sent: Monday, September 23, 2013 5:22 PM
To: solr-user
Subject: Using CachedSqlEntityProcessor with delta imports in DIH

I'm trying to use the CachedSqlEntityProcessor on a child entity that also
has a delta query.

Full imports and delta imports of the parent entity work fine however delta
imports for the child entity have no effect. If I remove the
processor="CachedSqlEntityProcessor" attribute from the child entity, the
delta import works flawlessly but the full import is very slow.
Here's my data-config.xml:



  http://www.w3.org/2001/XInclude"/>
  

  
  

  



I need to be able to run delta imports based on the media_tags_map table in
addition to the story_sentences table.

Any idea why delta imports for media_tags_map won't work when the
CachedSqlEntityProcessor is used?

I've searched extensively but can't find an example that uses both
CachedSqlEntityProcessor and deltaQuery on the sub-entity or any
explanation of why the above configuration won't work as expected.

--

Thanks,

David


RE: solr4.4 admin page show "loading"

2013-09-24 Thread Ramesh
Use Mozilla for better use even in IE it is not working properly

-Original Message-
From: William Bell [mailto:billnb...@gmail.com] 
Sent: Tuesday, September 24, 2013 12:02 PM
To: solr-user@lucene.apache.org
Subject: Re: solr4.4 admin page show "loading"

Use Chrome.


On Thu, Sep 19, 2013 at 7:32 AM, Micheal Chao wrote:

> hi, I have installed solr4.4 on tomcat7.0. the problem is I can't see 
> the solr admin page, it's always show "loading". I can't find any 
> error in tomcat logs, and I can send search request, and get the result.
>
> what can I do? please help me, thank you very much.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr4-4-admin-page-show-loading-tp4
> 091039.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>



--
Bill Bell
billnb...@gmail.com
cell 720-256-8076




Soft commit and flush

2013-09-24 Thread adfel70
I am struggling to get a deep understanding of soft commit.
I have read  Erick's post

  
which helped me a lot with when and why we should call each type of commit.
But still, I cant understand what exactly happens when we call soft commit:
I mean, does the new data is flushed, fsynched, or hold in the RAM... ?
I tried to test it myself and I got 2 different behaviours: 
a. If I just had 1 document that was added to the index, soft commit did not
cause index files to change.
b. If I had a big change (addition of about 100,000 docs, ~5MB tlog file),
calling the soft commit DID change the index files - so I guess that soft
commit caused fsynch.

My conclusion is that soft commit always flushes the data, but because of
the implementation of NRTCachingDirectoryFactory, the data will be written
to the disk when its getting too big. 

Can some one please correct me? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Complex query combining fq and q with join

2013-09-24 Thread marotosg
I found the solution.


http://dzoessolr020:8080/solr4/person/select/?
&q= 
(
( ( GenderSFD:Male )
AND {!join from=PersonID to=CoreID fromIndex=personjob
v='((CoCompanyName:"hospital") OR (PoPositionsAllS:"developer"))'} 

AND {!join from=DocPersonAttachS to=CoreID fromIndex=document v='(DocNameS:"
PeterRES")'} 
)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complex-query-combining-fq-and-q-with-join-tp4091563p4091725.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: searching within documents

2013-09-24 Thread Nutan
Why does it happens that for few words it shows output and for few it does
not?

For example,
1)
q=contents:Sushant

numfound is 0

q=contents:sushant

gives output

2)
q=contents:acted

numfound 0

q=contents:well

gives output

This is the document:

  
13

  chetan

worst book
solr,lucene
Sushant acted well in kaipoche.
3 mistakes
0012345654334



Please do reply.Help will be appreciated.
Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091713.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr DIH call a java class

2013-09-24 Thread Prasi S
Hi,
Can we call a java class inside a solr ddata-config.xml file similar to
calling a script function.

I have few manipulations to do before sending data via dataimporthandler.
For each row, can I pass that row to a java class in the same way we pass
it to a script function?


Thanks,
Prasi


RE: searching within documents

2013-09-24 Thread Nutan
Okay thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091705.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: searching within documents

2013-09-24 Thread Gupta, Abhinav
It's not always that when you change schema.xml you need to re-index. 
For eg., if you add any tokenizer for Query Analyser you don't need to reindex. 

But in below case I suppose your changes in schema is related for indexing 
time. Then you need to re-index.

Sequencing of documents depends entirely on relevance (score) of document. 
Hope it helps.

Thanks,
Abhinav

-Original Message-
From: Nutan [mailto:nutanshinde1...@gmail.com] 
Sent: 24 September 2013 14:34
To: solr-user@lucene.apache.org
Subject: Re: searching within documents

First I indexed documents using "indexing xml files to solr(sending doc to solr 
using xml file)"
Then I made changes to schema.xml ie. I added analyzer and tokenizer.
I then indexed some new documents using same procedure,now my searching with 
spaces works only for newly indexed files and not the initial files.
Is it true that, after making changes to schema.xml re-indexing is necessary??

Is it the case that searching few words works and for others it may not, like 
when i query:
q=contents:used

output:numfound=0

and for
q=contents:for
 
output:
 "response":{"numFound":2,"start":0,"docs":[
  {
"id":"7",
"author":["nutan"],
"comments":"best book",
"keywords":"solr,lucene",
"contents":"solr,lucene is used for search based service.",
"title":"solr cookbook 3.1",
"revision_number":"0012345654334"},
  {
"id":"8",
"author":["nutan shinde"],
"comments":"best book for solr",
"keywords":"solr,lucene,apache tika",
"contents":"solr,lucene is used for search based service.Google works 
uses web crawler.Lucene can implelment web crawler",
"title":"solr enterprise search server",
"revision_number":"00123467889767"}]
  }}

my shema.xml is:
















   
   







id


and also for each query :
contents:for
contents:search

the sequence in which documents occur changes.What is the reason for it?
How are the documents retrieved?Does it depend on the number of indexes




--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091697.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: searching within documents

2013-09-24 Thread Gora Mohanty
On 24 September 2013 14:34, Nutan  wrote:
> First I indexed documents using "indexing xml files to solr(sending doc to
> solr using xml file)"
> Then I made changes to schema.xml ie. I added analyzer and tokenizer.
> I then indexed some new documents using same procedure,now my searching with
> spaces works only for newly indexed files and not the initial files.
> Is it true that, after making changes to schema.xml re-indexing is
> necessary??
[...]

Yes, it is required.

Regards,
Gora


Re: searching within documents

2013-09-24 Thread Nutan
First I indexed documents using "indexing xml files to solr(sending doc to
solr using xml file)"
Then I made changes to schema.xml ie. I added analyzer and tokenizer.
I then indexed some new documents using same procedure,now my searching with
spaces works only for newly indexed files and not the initial files.
Is it true that, after making changes to schema.xml re-indexing is
necessary??

Is it the case that searching few words works and for others it may not,
like when i query:
q=contents:used

output:numfound=0

and for
q=contents:for
 
output:
 "response":{"numFound":2,"start":0,"docs":[
  {
"id":"7",
"author":["nutan"],
"comments":"best book",
"keywords":"solr,lucene",
"contents":"solr,lucene is used for search based service.",
"title":"solr cookbook 3.1",
"revision_number":"0012345654334"},
  {
"id":"8",
"author":["nutan shinde"],
"comments":"best book for solr",
"keywords":"solr,lucene,apache tika",
"contents":"solr,lucene is used for search based service.Google
works uses web crawler.Lucene can implelment web crawler",
"title":"solr enterprise search server",
"revision_number":"00123467889767"}]
  }}

my shema.xml is:


 












 




  





 








id


and also for each query :
contents:for
contents:search

the sequence in which documents occur changes.What is the reason for it?
How are the documents retrieved?Does it depend on the number of indexes




--
View this message in context: 
http://lucene.472066.n3.nabble.com/searching-within-documents-tp4090173p4091697.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud setup - any advice?

2013-09-24 Thread Neil Prosser
Shawn: unfortunately the current problems are with facet.method=enum!

Erick: We already round our date queries so they're the same for at least
an hour so thankfully our fq entries will be reusable. However, I'll take a
look at reducing the cache and autowarming counts and see what the effect
on hit ratios and performance are.

For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds
and our hard commit is 15 minutes.

You're right about those sorted fields having a lot of unique values. They
can be any number between 0 and 10,000,000 (it's sparsely populated across
the documents) and could appear in several variants across multiple
documents. This is probably a good area for seeing what we can bend with
regard to our requirements for sorting/boosting. I've just looked at two
shards and they've each got upwards of 1000 terms showing in the schema
browser for one (potentially out of 60) fields.



On 21 September 2013 20:07, Erick Erickson  wrote:

> About caches. The queryResultCache is only useful when you expect there
> to be a number of _identical_ queries. Think of this cache as a map where
> the key is the query and the value is just a list of N document IDs
> (internal)
> where N is your window size. Paging is often the place where this is used.
> Take a look at your admin page for this cache, you can see the hit rates.
> But, the take-away is that this is a very small cache memory-wise, varying
> it is probably not a great predictor of memory usage.
>
> The filterCache is more intense memory wise, it's another map where the
> key is the fq clause and the value is bounded by maxDoc/8. Take a
> close look at this in the admin screen and see what the hit ratio is. It
> may
> be that you can make it much smaller and still get a lot of benefit.
> _Especially_ considering it could occupy about 44G of memory.
> (43,000,000 / 8) * 8192 And the autowarm count is excessive in
> most cases from what I've seen. Cutting the autowarm down to, say, 16
> may not make a noticeable difference in your response time. And if
> you're using NOW in your fq clauses, it's almost totally useless, see:
> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>
> Also, read Uwe's excellent blog about MMapDirectory here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> for some problems with over-allocating memory to the JVM. Of course
> if you're hitting OOMs, well.
>
> bq: order them by one of their fields.
> This is one place I'd look first. How many unique values are in each field
> that you sort on? This is one of the major memory consumers. You can
> get a sense of this by looking at admin/schema-browser and selecting
> the fields you sort on. There's a text box with the number of terms
> returned,
> then a / ### where ### is the total count of unique terms in the field.
> NOTE:
> in 4.4 this will be -1 for multiValued fields, but you shouldn't be
> sorting on
> those anyway. How many fields are you sorting on anyway, and of what types?
>
> For your SolrCloud experiments, what are your soft and hard commit
> intervals?
> Because something is really screwy here. Your sharding moving the
> number of docs down this low per shard should be fast. Back to the point
> above, the only good explanation I can come up with from this remove is
> that the fields you sort on have a LOT of unique values. It's possible that
> the total number of unique values isn't scaling with sharding. That is,
> each
> shard may have, say, 90% of all unique terms (number from thin air). Worth
> checking anyway, but a stretch.
>
> This is definitely unusual...
>
> Best,
> Erick
>
>
> On Thu, Sep 19, 2013 at 8:20 AM, Neil Prosser 
> wrote:
> > Apologies for the giant email. Hopefully it makes sense.
> >
> > We've been trying out SolrCloud to solve some scalability issues with our
> > current setup and have run into problems. I'd like to describe our
> current
> > setup, our queries and the sort of load we see and am hoping someone
> might
> > be able to spot the massive flaw in the way I've been trying to set
> things
> > up.
> >
> > We currently run Solr 4.0.0 in the old style Master/Slave replication. We
> > have five slaves, each running Centos with 96GB of RAM, 24 cores and with
> > 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs)
> but
> > aren't slow either. Our GC parameters aren't particularly exciting, just
> > -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11.
> >
> > Our index size ranges between 144GB and 200GB (when we optimise it back
> > down, since we've had bad experiences with large cores). We've got just
> > over 37M documents some are smallish but most range between 1000-6000
> > bytes. We regularly update documents so large portions of the index will
> be
> > touched leading to a maxDocs value of around 43M.
> >
> > Query load ranges between 400req/s to 800req/s across the five slaves
> > throughout the day, increasing and decreasing gradually 

Re: Hash range to shard assignment

2013-09-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
That is in the pipeline. within next 3-4 months for sure

On Mon, Sep 23, 2013 at 11:07 PM, lochri  wrote:
> Yes, actually that would be a very comfortable solution.
> Is that planned ? And if so, when will it be released ?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091591.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
Noble Paul


Re: requested url solr/update/extract not available on this server

2013-09-24 Thread Nutan
Rest of the queries work and i have added the  following in solrconfig.xml:


last_modified
contents
true
ignored_




On Sun, Sep 22, 2013 at 8:53 PM, Erick Erickson [via Lucene] <
ml-node+s472066n4091440...@n3.nabble.com> wrote:

> Please review:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> Erick
>
> On Sun, Sep 22, 2013 at 5:52 AM, Nutan <[hidden 
> email]>
> wrote:
>
> > I did define the request handler.
> >
> >
> > On Sun, Sep 22, 2013 at 12:51 AM, Erick Erickson [via Lucene] <
> > [hidden email] >
> wrote:
> >
> >> bq: And im not using the example config file
> >>
> >> It looks like you have not included the request handler in your
> >> solrconfig.xml,
> >> something like (from the stock distro):
> >>
> >>   
> >>>>   startup="lazy"
> >>   class="solr.extraction.ExtractingRequestHandler" >
> >> 
> >>   true
> >>   ignored_
> >>
> >>   
> >>   true
> >>   links
> >>   ignored_
> >> 
> >>   
> >>
> >> I'd start with the stock config and try removing things one-by-one...
> >>
> >> Best,
> >> Erick
> >>
> >> On Sat, Sep 21, 2013 at 7:34 AM, Nutan <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=4091391&i=0>>
> >> wrote:
> >>
> >> > Yes I do get the solr admin page.And im not using the example config
> >> file,I
> >> > have create mine own for my project as required.I have also defined
> >> > update/extract in solrconfig.xml.
> >> >
> >> >
> >> > On Tue, Sep 17, 2013 at 4:45 AM, Chris Hostetter-3 [via Lucene] <
> >> > [hidden email] >
>
> >> wrote:
> >> >
> >> >>
> >> >> : Is /solr/update working?
> >> >>
> >> >> more importantly: does "/solr/" work in your browser and return
> >> anything
> >> >> useful?  (nothing you've told us yet gives us anyway of knowning if
> >> >> solr is even up and running)
> >> >>
> >> >> if 'http://localhost:8080/solr/' shows you the solr admin UI, and
> you
> >> are
> >> >> using the stock Solr 4.2 example configs, then
> >> >> http://localhost:8080/solr/update/extract should not give you a 404
> >> >> error.
> >> >>
> >> >> if however you are using some other configs, it might not work
> unless
> >> >> those configs register a handler with the path /update/extract.
> >> >>
> >> >> Using the jetty setup provided with 4.2, and the example configs
> (from
> >> >> 4.2) I was able to index a sample PDF just fine using your curl
> >> command...
> >> >>
> >> >> hossman@frisbee:~/tmp$ curl "
> >> >> http://localhost:8983/solr/update/extract?literal.id=1&commit=true";
> -F
> >> >> "myfile=@stump.winners.san.diego.2013.pdf"
> >> >> 
> >> >> 
> >> >> 0 >> >> name="QTime">1839
> >> >> 
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> :
> >> >> : Check solrconfig to see that /update/extract is configured as in
> the
> >> >> standard
> >> >> : Solr example.
> >> >> :
> >> >> : Does /solr/update/extract work for you using the standard Solr
> >> example?
> >> >> :
> >> >> : -- Jack Krupansky
> >> >> :
> >> >> : -Original Message- From: Nutan
> >> >> : Sent: Sunday, September 15, 2013 2:37 AM
> >> >> : To: [hidden email]<
> >> http://user/SendEmail.jtp?type=node&node=4090459&i=0>
> >> >> : Subject: requested url solr/update/extract not available on this
> >> server
> >> >> :
> >> >> : I am working on Solr 4.2 on Windows 7. I am trying to index pdf
> >> files.I
> >> >> : referred Solr Cookbook 4. Tomcat is using 8080 port number. I get
> >> this
> >> >> : error:requested url solr/update/extract not available on this
> server
> >> >> : When my curl is :
> >> >> : curl "
> >> http://localhost:8080/solr/update/extract?literal.id=1&commit=true";
> >> >> -F
> >> >> : "myfile=@cookbook.pdf"
> >> >> : There is no entry in log files. Please help.
> >> >> :
> >> >> :
> >> >> :
> >> >> : --
> >> >> : View this message in context:
> >> >> :
> >> >>
> >>
> http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153.html
> >> >> : Sent from the Solr - User mailing list archive at Nabble.com.
> >> >> :
> >> >>
> >> >> -Hoss
> >> >>
> >> >>
> >> >> --
> >> >>  If you reply to this email, your message will be added to the
> >> discussion
> >> >> below:
> >> >>
> >> >>
> >>
> http://lucene.472066.n3.nabble.com/requested-url-solr-update-extract-not-available-on-this-server-tp4090153p4090459.html
> >> >>  To unsubscribe from requested url solr/update/extract not available
> on
> >> >> this server, click here<
> >>
> >> >> .
> >> >> NAML<
> >>
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_ema

RE: SolrParams to and from NamedList

2013-09-24 Thread Peter Kirk
Hi - I think there is a bug in the conversion methods for SolrParams. But it 
seems that using ModifiableSolrParams (to add and remove parameters and values, 
which is what I want to do), is the way to go.

/Peter


-Original Message-
From: Peter Kirk [mailto:p...@alpha-solutions.dk] 
Sent: 23. september 2013 21:09
To: solr-user@lucene.apache.org
Subject: SolrParams to and from NamedList

Hi,

In a request-handler, if I run the below code, I get an exception from Solr 
undefined field: "[Ljava.lang.String;@41061b68"

It appears the conversion between SolrParams and NamedList and back again fails 
if one of the parameters is an array. This could be a couple of configuration 
parameters like category author


public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) 
throws Exception {

  SolrParams params = req.getParams();
  NamedList parameterList = params.toNamedList();
  SolrParams newSolrParams = SolrParams.toSolrParams(parameterList);

  req.setParams(newSolrParams);
  super.handleRequestBody(req, rsp);


How can I generate the correct conversion?

Thanks.


Re: Get only those documents that are fully satisfied.

2013-09-24 Thread Andre Bois-Crettez

(Your schema and query only appear on the nabble.com forum, it is mostly
empty for me on the mailing list)

What you want is probable to change OR to AND :

params.set("q.op", "AND");


André
On 09/23/2013 04:44 PM, asuka wrote:

Hi Jack,
I've been working with the following schema field analyzer:



Regarding the query, the one I'm using right now is:



But with this query, I just get results by the presence of any of the words
at the sentence.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091565.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
André Bois-Crettez

Software Architect
Search Developer
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.